Add S-PWM #236
Replies: 9 comments 5 replies
-
I don't understand. This library has no semblance to the RPI library under the hood and PWM is completely irrelevant to this library. The RPI library essentially bit-bangs the GPIO output using the CPU. |
Beta Was this translation helpful? Give feedback.
-
It is based on traditional PWM using BCM right? Single period per row. Do you use a timer or DMA to keep track of the BCM bit-planes? As long as you are doing traditional PWM with or without BCM you can use this. Basically, you just divide the single period per row into multiple periods per row. Under traditional PWM the refresh is wasted somewhat. You have really high frame rate, which unless needed could be used to increase chain length or increase color depth. However, in some cases this could also be bad. However, in that case it could be used to lower the serial clock rate required which can be useful in low voltage, long chains and slower power supplies. This can lead to increase in interrupts with could be fixed with multicore, scatter gather, DMA chaining, etc. This trick optimizes the CPU and memory consumption for microcontrollers. Microcontrollers are generally based on SRAM and have higher levels of determinism than say the Pi. The Pi has to play rounding tricks in the bit-bang but has more memory and processing power. Ironically this means the full version is not likely very useful to either platform. Hardware like FPGA could possible do it, but it is likely more expensive to do so. Would be cheaper to use MM LED driver. MM LED drivers are beast of themselves. Two or three approaches possible there for the most part. However, these are great for two different application areas, high density and low multiplex with high contrast. Standard panels can accomplish more overlap using S-PWM. It will not be able to compete directly, but it can close the window some. To the point it may not matter. Handling the 16 bit per pixel is hard for many microcontrollers, resulting in memory shortage or complex software models. ESP32's multicore does provide a break from this. DMA and scatter gather may also provide options. |
Beta Was this translation helpful? Give feedback.
-
I don't really understand what the proposal is to be honest. With this library we construct a 16 bit (2 byte) wide binary blob bitstream to send out the GPIO's in parallel at the clock-speed the I2S hardware provides. There's no use of PWM hardware or anything else - whatever is in this DMA memory blob allocation is spat out repeatedly as quickly as possible. So the concept of S-PWM isn't applicable to this library or hardware, and in any case, I think what we have is pretty good for what it does and I don't wish to complicate the code any further. This library handles 24bpp no issues, just don't expect to drive a Full HD display with only 200kB of SRAM. |
Beta Was this translation helpful? Give feedback.
-
From a configuration standpoint there is an increase in complexity. However, there is also an increase in flexibility. What I am proposing is to allow scaling the frame rate on the serial bus to be as low as 1, without losing any refresh. In fact, I am proposing we allow refresh above 1kHz. The way this is done is perceptive refresh scaling. Basically, we divide the period into sections of 64, which for a 30FPS serial bus would yield a multiplexing refresh of 1920Hz. Now this is not for free. With traditional PWM you have to do 2^x shifts/interrupts per row times frame rate (refresh). With traditional PWM using BCM you can do 2^x or x shifts/interrupts per row times frame rate (refresh). With the scaled back version of S-PWM that I am proposing you would have 2^x or 2^upper * lower shifts per row times frame rate. Now the refresh becomes 2^upper times frame rate. No longer does the frame have to at least 100Hz for decent multiplexing, which means more serial bus time. From the programming side it is fairly straight forward. However, the user may experience additional confusion. Overall, this can be very powerful. Again, this is not full S-PWM. This trick does not do PWM, it does BCM. However, it is a little more intense. The core logic you have will not change. You will simply modify the state machine to change the rows a little bit more frequently. DMA should not care. It will still push data out from memory buffer. It will duplicate shifts, assuming such duplication is not already done to keep track of time via DMA. The state machine will run more, which can lead to overhead. However, the multiplier for refresh is also the interrupt multiplier. So, there is a catch. Going back to my example a 16x32 matrix if ran at 25MHz could have 10-bit color depth with 95 Hz refresh. This is what traditional PWM gives you. However, with S-PWM you could do 11-bit color depth (6-bit upper and 5-bit lower) with 1920 Hz refresh using 15.8MHz. This is possibly since we moved the 95 FPS to 30 FPS, which means a difference of 3.17. If we increase the color depth by one bit this becomes 1.59. If you divide 25 by 1.59 you get 15.8. I can get really crazy with it too if I play with the ratio of upper to lower. Note I would only really get 11-bit color depth if the panel had low multiplex and fairly fast PSU. However, the lower clock will help with this on standard panels. (The other way to help would be to increase the chain length which would require more number tinkering.) The comes in handy in certain places. 24bpp is nearly impossible on 16x128 with traditional PWM above 95Hz refresh on single connector. Assuming 25MHz is even stable or possible. You can do time averaging tricks like what the Pi does to average in 8-bit planes. In a personal project on POE. I could do at most 15-21bpp for 16x128. (LED current is limitation.) If I used S-PWM the refresh would up to 960-3840 Hz. With a serial clock of around 1-4MHz. Assuming my power supply could handle 7.7-30.8 kHz response. (This should not be hard to accomplish.) Note this is on single chain with 30FPS. Max interrupt rate would be 32-128 times 8 per frame. Meaning per second my interrupt rate would be my PSU response rate. In this case the interrupt rate is less than the traditional PWM interrupt rate. There is a breakeven point depending on conditions. However, I could lower the interrupt rate if I wanted too. This would result in a loss of refresh if I kept the serial bandwidth the same. Which would prevent the PSU response frequency from increasing. Just a suggestion. |
Beta Was this translation helpful? Give feedback.
-
Very interesting. I think I understand, but we're not on the same page here. For silicon hardware and more importantly, memory reasons, this can't be done on an ESP32. There is no concept of interrupts either with this library. And if it could I think it would be a rewrite of the code. Have a read of libraries core .CPP file to see how it works. There is no CPU involvement with this library other than to update pixel values. Using interrupts to do S-PWM / trigger the DMA buffer will lead to CPU load and/or flicker should people be using the underpowered microprocessor to do sketch work. Summary: Good idea, wrong silicon. |
Beta Was this translation helpful? Give feedback.
-
You implement a scatter gather DMA linked list? So, this will still work but may increase RAM consumption. The RAM overhead comes from the extra descriptors, which you can control. However, this will create a limit for scaling. For smaller displays this should still work. Fundamentally you have interrupts still. However, since DMA fully automates control signals like row select there could be an increase. Potentially increasing these descriptors by (2^upper * lower) / (upper + lower) times. (Guess) The overall bit-plane descriptors would remain the same. |
Beta Was this translation helpful? Give feedback.
-
After playing with it some more in detail, it looks like you need the redistribution. Which is kind of a pain. What I saw on the Pi was likely the product of something else. On other platforms and lower bit depths that version is less productive. In fact, on the current system I am working, potentially completely pointless. After thinking about it, this makes sense. However, adding in the redistribution does take considerable resources in memory. I may explore the other version I have of this. I need to divide period as simply increasing the scan rate does not work with BCM. This does work better with traditional PWM, but there is a way to do this on BCM. I will play with it. I think I have something that may still work. Ideally it will want a FPGA for the actual PWM part which is hard to do with CPU. The BCM part is easy. In my case there are two bits of color depth potentially on the table. This means an increase in 2.67 times in RAM however I may be able to work that down with some funny math. Look at this: https://www.ti.com/lit/an/slva645/slva645.pdf?ts=1642172180899&ref_url=https%253A%252F%252Fwww.ti.com%252Fproduct%252FTLC5958 Redistribution of 9+7 is basically take the upper 9 bits and add one. Then look at the lower 7 bits, round to power of 2. Divide the display into sections according to that divisor. Place the upper 9 bits plus one in that section. Till the counter has expired. (Binary tree/BCM redistribution.) Meaning you have 2^lower * upper bit-planes. You want to reduce the lower number to save memory and CPU cycles. This will rebase the period into multiple sections of time. Thus, BCM does not force such a huge clock divider. In the end you are left with a big array. Frame[1 << lower][rows / 2][upper][col]. DMA could play that back. It just a bunch of linear loops. |
Beta Was this translation helpful? Give feedback.
-
I figured out about three versions of this. One without redistribution, one with BCM redistribution and one with PWM redistribution. No redistribution does not do much, if anything. BCM redistribution has a small issue. Its actual bit count is this log2(2^lower - 1 / 2^lower * 2^bits). PWM redistribution is the actual version of SPWM, but it has really high CPU usage. BCM SPWM helps lower the CPU usage, but it is still quite high in some cases. You have to be careful to keep the lower bit count above 1. A strong CPU is likely capable of doing this, but there is a lot required to get this going correctly. PWM redistribution is likely a complete waste without FPGA accelerator. It is worth noting that if you replay bitmaps your draw rate can be quite low. This is true for traditional PWM, SPWM, standard panels, PWM panels, SRAM panels, etc. It is only when you increase the pixel count, quality and FPS that you really have an issue with the CPU. Only SRAM panels allow avoiding memory usage issues, assuming control logic exists. This may increase the DMA descriptor overhead. Note some panels cannot work under high refresh, while other can. Also, depending on the amount of dead time between row changes required there could be a limit to the refresh. Usually this is small. High multiplex, refresh and blanking time are factors. Many libraries avoid these issues with traditional PWM. |
Beta Was this translation helpful? Give feedback.
-
After playing with it and thinking about it, S-PWM only really make sense for hardware PWM. It technically can work standard panels, but it is not worth it. S-PWM is a means of dynamically managing FPS against color depth. The refresh is held constant. Normally you hold the color depth constant, and this forces the FPS down. The refresh and FPS are tightly coupled. If you want to show a higher FPS, you need to let the quality slip. Otherwise, there is too much serial bandwidth per frame. Which means the time per frame is high which increases the frame latency. So, to get a smoother video playback experience you need to lower the color depth. |
Beta Was this translation helpful? Give feedback.
-
I tried this out on the Pi, however I pushed it under a different account. Here it is: https://github.com/greatballoflazers/rpi-rgb-led-matrix (I did not want to be associated directly with that project.)
Anyhow this is a reduced trick called S-PWM. What it does is increases the refresh rate and possibly the quality or size of the display. Normally this is done with PWM, which is problematic. I was able to scale it back with BCM and remove the color redistribution. Which results in no extra CPU or memory usage. (Some increase is possible in interrupts or state machine logic; however, you can configure that to your liking.)
There are three version of the S-PWM implementation in contrast to traditional PWM which has two. Meaning there is a total of five implementations, but two of them are not practical. The Pi may struggle a little bit in terms of determinism but will have more CPU and memory available compared to ESP32. For the time being I think only use the traditional PWM/S-PWM hybrid with only BCM is recommended.
A rough outline is possible as shown in that repo. I did not create the other two versions in that repo. I have some disagreements with Henner's rpi-rgb-led-matrix project, and I am not willing to disclose the others to them at this time. Note I do have some ideas for the memory mapped drivers, but I do not have a ESP32.
I did test this out on the Pi and it does improve a few things. However, I did not do extreme testing as it seemed a little pointless on that platform to some extent.
Beta Was this translation helpful? Give feedback.
All reactions