Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very rare but possible glitching on PRU signal generation that can cause unexpected flashes #49

Open
bigjosh opened this issue Nov 13, 2016 · 13 comments

Comments

@bigjosh
Copy link

bigjosh commented Nov 13, 2016

While the vast, vast majority of 0 bits coming out of the PRU are 300ns-370ns wide, I am seeing a very rare case where a 0 bit can be as wide as 540ns, which is wide enough to be seen as a 1 by some WS2812B chips.

When the problem happens, it seems to stretch all output bits being transmitted at that moment, although there is only material impact on 0 bits since 1 bits just become slightly longer 1 bits.

Outwardly, this appears as a row of pixels flashing for a single frame. It is especially noticeable when running strings in demo mode "black" when all bits should be 0. It is possible this is only visible on WS2812B chips with a shorter-than-spec T1H minimum time.

I verified the problem by attaching a scope to an output and setting to trigger on minimum pulse width of 450ns. Then I ran the "black" demo mode. In this mode, all bits should be 0 so I should never see a pulse wider than 450ns. Yet I was (rarely) able to capture pulses as wide as 540ns.

The stretched bits seem to happen more frequently when the ARM is under heavy memory stress so I think this might be caused by a worst-case series of cache misses when the PRU accesses the data in ARM RAM.

The current approach of timing the bit phases uses the cycle counter. Is it possible that the cycle counter does not not count cycle where the PRU is stalled because it is waiting for a cache miss when reading external RAM? The STALL COUNT register possibly indicates this...

STALLCOUNT
This value is incremented by 1 for every cycle during which the PRU
is enabled and the counter is enabled (both bits ENABLE and
COUNTENABLE set in the PRU control register), and the PRU was
unable to fetch a new instruction for any reason.

Possible solutions might include...

  1. Rearrange current code so that all of the accesses to external RAM occur between bits rather than during the T0H phase of the bits. This would still add jitter to the time between bits when cache misses occur, but as long as this time is less than RESET, then the only impact should be (very) slightly diminished performance rather than bad data.
  2. Rewrite PRU code to copy pixel data into PRU RAM first and then transmit the bits directly from local PRU RAM during the timing sensitive frame.
  3. Rewrite the PRU code to use the IEP_TIMER to time the signal phases rather than the cycle counter. The IEP_COUNTER seems to be able to run deterministicly at 200MHz no matter what is happening with PRU accesses.

I can try to tackle either of these approaches, but just want a sanity check before doing the work. Has anyone else ever seen these wide bits (or the flashes they produce)?

@bigjosh
Copy link
Author

bigjosh commented Nov 14, 2016

Digging in further on this, I think the ultimate source of the jitter is the fact that the PRU code accesses the GPIO pins though the ARM address space rather than directly via r30. This can cause stalls when there is contention, and I think these stalls are the root problem. With this in mind, I think the best solution might be to rewrite the PRU stuff to go direct to the pins and get absolute deterministic timing. This would not be as much work as it might seem now that the PRU C compiler is getting mature, but I am hesitant to do it if I am really the only one who had even been effected by this issue....

@Yona-Appletree
Copy link
Owner

Hey, thanks for looking into this, and I'm sorry I didn't get back to you sooner.

I am aware of the issue, and have done several things to mitigate it on various branches. The simplest one, that I think is on master, is simply to check how many cycles have passed since the zero write started, and if it's too long, we abort the entire frame. This has the effect of only showing one white pixel rather than corrupting the rest of the frame.

Secondly, you can rewrite the PRU code to not go back to DRAM for every bit, but rather load data in entire RGB chunks into the registers and write that. I have an experimental branch, spi-cape-support, that supports this along with several other improvements. The main change is that a custom PRU program is built at runtime for the specific number of LEDs and driver type.

The problem with using r30 is that you're quite limited in the number of channels you can output. Something like 12. That's far too few for my use cases, but I can certainly see the value for some projects. Combining that with loading all data into the registers should be pretty foolproof, though you still might have to drop a partial frame if it takes too long to get the next pixel of data. Using PRU RAM might fix this, though I had some issues with it when I tried.

I'd be happy to talk with you via phone about your investigation and my ideas and work. Feel free to email me at [email protected] if you're interested. I have some time tonight.

@bigjosh bigjosh changed the title Very rare but possible glitching on PRU cache miss when reading from DDR RAM? Very rare but possible glitching on PRU signal generation that can cause Nov 14, 2016
@bigjosh bigjosh changed the title Very rare but possible glitching on PRU signal generation that can cause Very rare but possible glitching on PRU signal generation that can cause unexpected flashes Nov 14, 2016
@bigjosh
Copy link
Author

bigjosh commented Dec 27, 2016 via email

@bigjosh
Copy link
Author

bigjosh commented Feb 13, 2017

I am pretty sure this is the root of the problem...

" The PRU read instruction executes in ~2 cycles, plus additional latencies due to traversing through interconnect layers and variable processing loads. "

http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Latencies

I think it will take some deep digging in the bowels of the on-chip LAN to reduce these jitters.

@Yona-Appletree
Copy link
Owner

It's worth noting, however, that it's the writes that directly affect the GPIO jitter:

The PRU write instruction is a fire-and-forget command that executes in ~1 cycle.

The problem with this is that we can't even tell how long it took for the write to get to the GPIO register, so we can't account for the jitter.

With the version of the code on master, we are reading all the data for every bit, which could cause issues due to long reads (and there are checks to abort the strip write in this case).

I suspect the only real solution is to use r30. I have a prototype of an r30-based ws281x driver working on the spi-cape-support branch. Going forward, I'd like to merge all that into master and call out in the docs that there is the 48-port-capable-but-slightly-janky version and the 22-port-but-stable version. More testing is required before we're at that point, though.

@bigjosh
Copy link
Author

bigjosh commented Feb 14, 2017

Ah, yes. Humbly corrected- the jitter on the write side is totally different (and invisible!), but I still think ultimately depends on the interconnect fabric priorities.

R30 a great solution for me since I never need that many pins. Any feel on how important more pins are in general?

@Yona-Appletree
Copy link
Owner

Yona-Appletree commented Feb 14, 2017

Good question! I honestly don't have a good idea of who is using LEDscape right now (other than you!), and what their needs are :) That would be nice info to have, though.

@Serisium
Copy link

Personally, I use LEDScape to drive 23 separate strips from the BBB, have disabled HDMI, and don’t use any other GPIO lines.

@Yona-Appletree
Copy link
Owner

Well, if you could bring that down to 22, you could use all r30 pins. Technically you can also disable the eMMC, but that's a little harder to deal with.

@orangemelon69
Copy link

Hi Yona!

Big fan of your work!

I went from one LED fun several months ago to custom produced rigid board matrixes for displaying lots of dynamic data on industrial machines. I went through all the initial things of playing with arduino simple stuff not knowing how to even solder and now playing with oscilloscopes:) Big Josh and the great work of his brought me to your fork here actually:)

This flicker issue is the last piece in my puzzle. And that finally brought me here to this thread.

My LED type is SK6812 which have a slighty different timing namely the 1’s in question are shorter therefore I believe this flicker problem is a lot more pronounced for my case.

I don’t use any cape or level shifter, I brought the voltage down to 4.3 (saving power and reducing brightness which is a plus in my case) and the signals register just fine altogether.

I will take a look at the prototype driver in the branch you mentioned tonight. Would love to contribute further somehow in case you wanted to merge that into master. Or if there are any news regarding this, I would be glad if you let us know.

Would shortening of the time help here (according to the SK6812 specs and their timing threaholds)? I tried to look at the templates but the machine code is too low level and therefore at this stage below my comprehension:)

Cheers

@Yona-Appletree
Copy link
Owner

Yona-Appletree commented Mar 5, 2018 via email

@orangemelon69
Copy link

Speak about lightning fast replies:)

I am driving 7 strips (up to cca 600px each but normally around 300-500)

7 outs is the max I will need for this use.

If you’d be so kind that would be just great!

Cheers

@Yona-Appletree
Copy link
Owner

Yona-Appletree commented Mar 6, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants