|
|
View previous topic :: View next topic |
Author |
Message |
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Mon Sep 17, 2012 3:55 pm |
|
|
This program blinks an LED at a 1 Hz rate with vs. 4.134. I tested it in
hardware with an 18F45K22, but it should also work with an 18F46K22.
Code: |
#include <18F45K22.h>
#fuses INTRC_IO,NOWDT,PUT,BROWNOUT,NOLVP
#use delay(clock=64M)
//======================================
void main(void)
{
// Blink the LED once per second. The 10/90 duty cycle
// makes it easier to time the flashes with a stopwatch.
while(1)
{
output_high(PIN_B0);
delay_ms(100);
output_low(PIN_B0);
delay_ms(900);
}
} |
|
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Sat Sep 29, 2012 2:50 pm |
|
|
So after taking a break, head scratching, debugging, and taking another break I got to take a better look on a scope.
For some reason, it seems trying to execute the for() loop is severely retarding the timing. Enough that there's at least a 50us delay, and that's causing the light string to latch and move on with life. I really didn't think that'd be possible (at 64 mhz!), but I can definitely confirm the data is coming much more slowly with long(ish) pauses when the loop is executed versus doing the bit test sequentially and banging the bits out.
Am I being unrealistic with my timing demands here? Or is there something else I'm missing? Or is there something else possibly going on? I know there are people doing this on a 16mhz arduino (with some careful optimization, I get that.. but 64mhz should allow me to be a bit more "lazy" no?) |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Sat Sep 29, 2012 3:13 pm |
|
|
Quote: | there's at least a 50us delay, and that's causing the light string to latch
|
Yes there is, and you put it in there. See the line in bold in your code below:
Quote: |
void send_frame() {
unsigned int16 i;
unsigned int bit;
unsigned int32 this_led;
for(i=0;i<LED_STRIP_LEN;i++) {
this_led = led_strip_colors[i];
for(bit=23;bit!=255;bit--) {
if(bit_test(this_led, bit) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
}
}
delay_us(50); // latch
} |
|
|
|
asmboy
Joined: 20 Nov 2007 Posts: 2128 Location: albany ny
|
|
Posted: Sat Sep 29, 2012 3:46 pm |
|
|
style: bit_test is True or false Boolean purity ;-))
Code: |
if(bit_test(this_led, bit) == 1)
// VS
if(bit_test(this_led, bit))
|
can't help but notice that |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Sun Sep 30, 2012 9:47 am |
|
|
PCM programmer wrote: | Quote: | there's at least a 50us delay, and that's causing the light string to latch
|
Yes there is, and you put it in there. See the line in bold in your code below:
Quote: |
void send_frame() {
unsigned int16 i;
unsigned int bit;
unsigned int32 this_led;
for(i=0;i<LED_STRIP_LEN;i++) {
this_led = led_strip_colors[i];
Code: | for(bit=23;bit!=255;bit--) {
if(bit_test(this_led, bit) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
} |
}
}
delay_us(50); // latch
} |
|
Yes. The latch is there but ONLY after iterating through the loop. What I'm saying is that WITHIN the loop, I'm finding at least a 50us delay.
this works:
Code: |
void send_frame() {
unsigned int16 i;
unsigned int32 this_led;
for(i=0;i<LED_STRIP_LEN;i++) {
this_led = led_strip_colors[i];
if(bit_test(this_led, 23) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
if(bit_test(this_led, 22) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
...
if(bit_test(this_led, 1) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
if(bit_test(this_led, 0) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
}
delay_us(50);
}
|
THIS doesn't:
Code: |
void send_frame() {
unsigned int16 i;
unsigned int bit;
unsigned int32 this_led;
for(i=0;i<LED_STRIP_LEN;i++) {
this_led = led_strip_colors[i];
for(bit=23;bit!=255;bit--) {
if(bit_test(this_led, bit) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
}
}
delay_us(50);
}
|
I'm seeing delays of > 50us INSIDE the for(bit=23;bit!=255;bit--) loop. Debugging prints the right number of 1s and 0s to the screen. But when executed on the actual hardware, only the first LED does things (flickers, does weird stuff). Probing with a scope and the timing inside the loop is all broken, but doing each of the 24 bit tests by hand manages to avoid the 50us+ pauses. Why? |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19513
|
|
Posted: Sun Sep 30, 2012 10:27 am |
|
|
You don't show your SPI configuration code.
I'd say it is using soft SPI at perhaps about 200KHz.
This will then take 48uSec to send the bytes.
Best Wishes |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Sun Sep 30, 2012 10:47 am |
|
|
Ttelmah wrote: | You don't show your SPI configuration code.
I'd say it is using soft SPI at perhaps about 200KHz.
This will then take 48uSec to send the bytes.
Best Wishes |
Sorry, the setup_spi command is earlier up in the thread. This is a PIC @ 64mhz with the following:
Code: |
setup_spi( SPI_MASTER | SPI_L_TO_H | SPI_CLK_DIV_16 );
|
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19513
|
|
Posted: Sun Sep 30, 2012 11:20 am |
|
|
However it'd be a _lot_ faster, to generate your own bit mask, load this with 2^23, and just rotate this once each time round the loop, and 'and' this with the value.
Problem is that 'bit test' with a constant generates a single machine instruction. With a variable, it generates a '1', then rotates this by the specified number, and 'and's this to find if the bit is true/false.
So your test generates an average of 13 4byte rotations each time round the loop. Add the loop count and the and operation, and you have something like perhaps 200 instructions, so I can see it taking perhaps 15+uSec.
Using your own mask instead, which only has to rotate once each time round the loop will save a lot of time.
Best Wishes |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19513
|
|
Posted: Sun Sep 30, 2012 12:09 pm |
|
|
As a comment, the fastest way to do this is probably:
Code: |
for (bit=0;bit<24 bit++) {
if(bit_test(this_led, 23) == 1) {
spi_write(ws2811_one);
}
else {
spi_write(ws2811_zero);
}
this_ked*=2;
}
|
This way you test the single _fixed_ bit (fast) and just rotate your stored number once each loop. I'd guess at least a dozen times faster.
Best Wishes |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Sun Sep 30, 2012 4:28 pm |
|
|
It's not necessarily how fast I can clock the bits out (though that does come into play), but a 50us low is considered a latch.
Basically I'm working with a string of LEDs driven by WS2811 ICs. these basically take 24 bits of data (8 bit red, green, blue), shift it in, and cascade the next 24 bits over to the next chip (and so on and so on down the line). Once you hold the data line low for 50+us, the ICs along the string all latch the data and display the incoming data.
What I'm seeing in my for() loop SEEMS to be 50us+ long low periods, causing the first LED to latch and the rest of the data never making it down the string (currently 100 LEDs long). When I do the bit tests individually all typed out, the data makes it all the way down the string and the expected result (LEDs do what I'm asking them to) happens. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19513
|
|
Posted: Mon Oct 01, 2012 3:00 am |
|
|
Try with the faster code.
Are you _sure_ your chip really is clocking at 64MHz?.
Historically, on some compiler versions, the settings as you have them, don't correctly enable the PLL. You have to have the setup_oscillator command to make it work.
I'd guess you are actually running at 16MHz, then the long time needed for the bit_test using a variable, _would_ take about 50+uSec, and 'voila' you have your problem.
Much faster though to do the single rotation as shown.
It takes about 750uSec to send the 24bits at 16Mhz using the existing code.
Average of 32uSec, but the earliest bits are the slowest, taking 52.5uSec for the first bit, 51.25 for the second, etc..
Using my version (two typing errors, but they are obvious), it takes just 4.5uSec/bit at the same clock rate, and the time between bits is constant.
Best Wishes |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Mon Oct 01, 2012 4:23 pm |
|
|
Ttelmah wrote: | Try with the faster code.
Are you _sure_ your chip really is clocking at 64MHz?.
Historically, on some compiler versions, the settings as you have them, don't correctly enable the PLL. You have to have the setup_oscillator command to make it work.
I'd guess you are actually running at 16MHz, then the long time needed for the bit_test using a variable, _would_ take about 50+uSec, and 'voila' you have your problem.
Much faster though to do the single rotation as shown.
It takes about 750uSec to send the 24bits at 16Mhz using the existing code.
Average of 32uSec, but the earliest bits are the slowest, taking 52.5uSec for the first bit, 51.25 for the second, etc..
Using my version (two typing errors, but they are obvious), it takes just 4.5uSec/bit at the same clock rate, and the time between bits is constant.
Best Wishes |
I'm pretty sure about 64mhz. I DID have issues using the internal osc and asking it to run at 64Mhz. While it happily acted like it was running at 64mhz when I tried using RS232 all I got was garbage. When I changed the clock down to 16mhz in that setup RS232 was fine. My current setup is using an external 16mhz crystal and PLL enabled. I've also used RS232 to debug in this setup and things are working as expected - so I don't yet have a reason to believe I'm _not_ at 64mhz this time.
I will try with your faster code tonight. Appreciate the insight, hopefully it'll get me to where I need to be. As a casual hobbyist bit operations have always been my weak point (and'ing/or'ing/shifting/etc are hard to grasp, dunno why). |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Mon Oct 01, 2012 8:47 pm |
|
|
Ttelmah wrote: | Try with the faster code.
Are you _sure_ your chip really is clocking at 64MHz?.
Historically, on some compiler versions, the settings as you have them, don't correctly enable the PLL. You have to have the setup_oscillator command to make it work.
I'd guess you are actually running at 16MHz, then the long time needed for the bit_test using a variable, _would_ take about 50+uSec, and 'voila' you have your problem.
Much faster though to do the single rotation as shown.
It takes about 750uSec to send the 24bits at 16Mhz using the existing code.
Average of 32uSec, but the earliest bits are the slowest, taking 52.5uSec for the first bit, 51.25 for the second, etc..
Using my version (two typing errors, but they are obvious), it takes just 4.5uSec/bit at the same clock rate, and the time between bits is constant.
Best Wishes |
Amazing. Your code works _beautifully_. I get the basics behind what it's doing.. but I don't 'get it'. Time to start reading more.. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19513
|
|
Posted: Tue Oct 02, 2012 12:48 am |
|
|
The key is understanding how bit_test works.
If used with a constant value, it works out in advance what byte is involved, and used the processor bit test instruction. Singe machine cycle.
With a variable though there is a problem. What it does is starts with a 'temporary' variable matching the size of the object you want to test. So in your case an int32. It then loads this with '1', and rotates this the number of times needed to get to the bit needed. So effectively in your case, it codes as:
Code: |
int1 test_bit(int32 val, int8 num){
int32 mask=1;
int8 ctr;
for (ctr=0;ctr<num;ctr++) {
mask*=2;
}
return ((val & mask)!=0);
}
|
So in your first case, with '23' as the bit number, it has to loop 23 times rotating the four byte variable. In all, it performs 23,22,21,.....2,1,0 rotations. A total of 276 4byte rotations, and 24 4byte AND operations!.....
Now my alternative hits it from the other end. You have your 24bit number stored in a variable, whose contents can be destroyed by the output. The first time round you want to send bit 23, the next time bit 22 etc...
So I code the first output, as a constant bit test (testing bit 23). Fast, gets rid of the need to perform the 4byte 'AND'.
Then I just rotate the variable _once_. What was bit 22, is now in bit 23, so the same constant test works. Each time round the loop, I just rotate once, and test. The counter becomes a simple 24* counter (no need for it to match the bit number involved), and each loop then has a single 4byte rotation, a bit test, and the increment/test of the counter. A total of just 23*4 rotations. Also, because there is no counting to the bit number, the operation takes constant time. About 5* faster on average, and nearly ten time faster on the 'worst case' bits.
Perhaps the big lesson is to 'beware' of bit test using a variable, especially with a large number like this. It can take a lot of processor work.....
However I still have to suggest you go back and triple check the processor is getting the 64MHz, since at this speed, the bit_test version should not cause the timing you are describing - I have stop-watched it, and at 64MHz, the longest bit time should still be only about 13uSec. It only gets to the timing you are reporting, and would give a problem, if your clock is not 64MHz....
Best Wishes |
|
|
thefloyd
Joined: 02 Sep 2009 Posts: 46
|
|
Posted: Tue Oct 02, 2012 8:04 am |
|
|
Thanks for taking the time out to explain this to me in a little more detail, it certainly clarifies things. I know my strong and weak points and code optimization isn't one of my stronger points (though I'm learning :D).. I figured I'd just throw more power at the problem as that wasn't an issue in this particular case.. funny how that doesn't always seem to solve the problem
I will double check tonight to see if I can be certain the PIC is operating at 64mhz. Quick thought - you say the timings I'd be seeing are in line with a device operating at 16mhz - isn't the internal PIC clock OSC/4? So even if I was at 64mhz my PIC would be executing instructions at 16mhz.. is that the missing link here?
Also, if you don't mind schooling me a bit more (or pointing me in the direction of a good read) .. how does this_led*=2 end up rotating the bits such that I'm shifting my next lower bit into the 23rd spot repeatedly? (And I'm assuming I'm getting zeros in their empty spaces as they shift? Not that it matters.. just curious).
If my data were 8 bit and I was testing the 8th bit, would it look like this?
10111011 <- starting data
01110110 <- first iteration
11101100 <- second iteration
11011000 ..
10110000 ..
01100000 ..
.. etc as I worked my way through the loop?
Or does the arithmetic "wrap around" such that the 8th bit would now be the 1st bit?
thanks again for your help. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|