View previous topic :: View next topic |
Author |
Message |
ouille Guest
|
Optimised for loop |
Posted: Fri Sep 03, 2004 8:58 am |
|
|
Hello,
CCS C for loop is not very efficient
for (i=0;i<8;i++) // ...;
63 cycles on 16f device opt level 9.
for (i=7;i;i--) // ...;
49 cycles.
i=8;
#asm
loop:
#endasm
//...
#asm
decfsz i,f
goto loop
#endasm
use 25 cycle.
It seem's that C perfs are double of optimised asm.
Why does the optimiser don't optimise structure like this :
0726: MOVF 59,W
0727: DECF 59,F
0728: XORLW 00 // flag Z already positionned
0729: BTFSC 03.2 // replace decfsz
072A: GOTO 72C
072B: GOTO 726
This optimisation for small loop is quite important for slow devices.
Is there a 'C' syntax to write better code ? |
|
|
ouille Guest
|
The right question is how to optimize a for loop... |
Posted: Fri Sep 03, 2004 11:10 am |
|
|
see my first message |
|
|
bdavis
Joined: 31 May 2004 Posts: 86 Location: Colorado Springs, CO
|
|
Posted: Fri Sep 03, 2004 12:09 pm |
|
|
Have you tried a do {} while or a while {} do ?
Maybe that will work better? |
|
|
ouille Guest
|
for loop |
Posted: Fri Sep 03, 2004 12:51 pm |
|
|
this is the same problem with do while. Quite same execution time |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Fri Sep 03, 2004 3:42 pm |
|
|
You have a good point here.
Which compiler version are you using? The latest compiler versions improved on optimization (although mainly for the PIC18). |
|
|
bdavis
Joined: 31 May 2004 Posts: 86 Location: Colorado Springs, CO
|
|
Posted: Fri Sep 03, 2004 7:51 pm |
|
|
It must be your version of the compiler or the chip type...
I got this in version 3.202 of the PCHW compiler...
.................... for (i=0; i<16; i++)
0040: CLRF 06
0042: MOVF 06,W
0044: SUBLW 0F
0046: BNC 004E
.................... {
.................... #asm
.................... nop
0048: NOP
.................... #endasm
.................... }
004A: INCF 06,F
004C: BRA 0042
if you exclude the nop that I put in, it's 6 instructions and 5 cycles per additional loop - that's good. I can do nothing really fast!!
I did find that doing an xor to flip a single bit sucks - 10 cycles
Then did a if, else to set or clear the bit - 5 instructions
Then I looked at assembler - BTG (bit toggle) - 1 instruction
Then I read the readme file - new function to toggle a bit - 1 instruction I think:)
It's all a learning process for me on what is optimized and what isn't. I have been fairly happy with the 18Fxxx so far though...
Good Luck! |
|
|
ouille Guest
|
for loop |
Posted: Sat Sep 04, 2004 2:13 am |
|
|
Hello,
my compiler version is 3.190, and y work on 16f pics
Overhead in your for loop is good. Compiler seem's to optimize a little bit.
I'had learn a long time ago that writing for loop with a i++ is not efficient on microcontroleur as there are instruction modifing directly Z flag. this can avoid the comparaison.
It's better to compare with 0.
for (i=15;i>=0;i--) ...
In this case the asm instruction is:
...
decfsz i,f
goto loop begin
wich is 3 cycles plus one decf i,w for reading loop index
I found strange that ccs don't optimise this kind of loop as there are relatively frequent in microC program.
Bye |
|
|
bdavis
Joined: 31 May 2004 Posts: 86 Location: Colorado Springs, CO
|
|
Posted: Sat Sep 04, 2004 11:20 am |
|
|
Yup - the decrement for loop is also good for the ARM processors too. I tried it on the CCS compiler and it didn't fully optimize it. I did try the following and it works great!
Repeated lop is 3 cycles if you exclude the nop...
It will loop 256 times since I was stupid enough to init i to zero instead of something smaller
.................... i = 0;
0040: CLRF 06
.................... do
.................... {
.................... #asm
.................... nop
0042: NOP
.................... #endasm
.................... i--;
0044: DECF 06,F
.................... }while (i>0);
0046: MOVF 06,F
0048: BNZ 0042 |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Sat Sep 04, 2004 11:54 am |
|
|
Quote: | CCS C for loop is not very efficient
for (i=0;i<8;i++) // ...;
63 cycles on 16f device opt level 9.
my compiler version is 3.190, and I work on 16f pics |
I installed PCM vs. 3.190 and compiled the test program shown below.
The loop code only takes 7 cycles. I tried it with and without #opt 9.
It compiles the same in each case.
How did you get 63 cycles ? Are you counting your code that's
in the body of the loop ? But that's not part of the loop control code.
Code: | #include <16F877.H>
#fuses XT, NOWDT, NOPROTECT, BROWNOUT, PUT, NOLVP
#use delay(clock = 4000000)
#define nop() #asm nop #endasm
//====================================
void main()
{
char i;
for(i=0;i<8;i++)
{
nop();
}
while(1);
} |
Code: | 0000 00284 .................... for(i=0;i<8;i++)
000A 1283 00285 BCF 03.5
000B 01A1 00286 CLRF 21
// The loop starts here:
000C 0821 00287 MOVF 21,W // 1 cycle
000D 3C07 00288 SUBLW 07 // 1 cycle
000E 1C03 00289 BTFSS 03.0 // 2 cycles (jump normally taken)
000F 2813 00290 GOTO 013
0000 00291 .................... {
0000 00292 .................... nop();
0010 0000 00293 NOP
0000 00294 .................... }
0011 0AA1 00295 INCF 21,F // 1 cycle
0012 280C 00296 GOTO 00C // 2 cycles |
|
|
|
ouille Guest
|
for loop optimisation |
Posted: Sun Sep 05, 2004 9:49 am |
|
|
Hello,
63 cycles is for the entire loop (8 iterations).
Each iteration is 7. Add some loop overhead.
My first question was perhaps confuse.
What is the C program that compile in an efficient for loop.
5 cycles form bdavis is better, but why is it impossible to achieve the 3cycles ? |
|
|
Trampas
Joined: 04 Sep 2004 Posts: 89 Location: NC
|
|
Posted: Sun Sep 05, 2004 5:49 pm |
|
|
Well you have to realize the processor only reconizes zero. That is all compares are done if the value is zero or not. Sort of...
Therefore to get the best performance out of loops have all loops end in zero. For example look at this:
Code: | 209: for(i=7; i!=0; i--)
002FB4 0E07 MOVLW 0x7
002FB6 6F44 MOVWF 0x44, BANKED
002FB8 5344 MOVF 0x44, F, BANKED
002FBA E003 BZ 0x2fc2
210: {
211: #asm nop #endasm
002FBC 0000 NOP
212: }
002FBE 0744 DECF 0x44, F, BANKED
002FC0 D7FB BRA 0x2fb8
|
Looks a lot like bdavis' code...
The real reason is that most programmers use a for loop with the index such that the index is used to index into data. That is you use the variable i in your for loop to access arrays or do other calculations. Some compilers do a dependecy check on i with-in the loop and if it is not used it will use the more effecient looping to zero. However not all compilers are that smart.
Trampas
Trampas |
|
|
ouille Guest
|
for loop optimisation |
Posted: Mon Sep 06, 2004 12:45 pm |
|
|
Hello,
Hy trampas, your anwer was exactly what i expected.
I don't know why a haven't test with i!=0 !!!
I've test your code on 16f device but but but, results are not quite as good:
Code: | 062E: MOVLW 08 ; initialisation ok
062F: MOVWF 52 ; init
0630: MOVF 52,F ;read i ...
0631: BTFSC 03.2 ;zero testing
0632: GOTO 635
...
0633: DECF 52,F ;decrement
0634: GOTO 630
|
but why does ccs don't use a decfsz ??? why ??? |
|
|
ouille Guest
|
for loop optimisation |
Posted: Mon Sep 06, 2004 12:49 pm |
|
|
Thank's for all i've got it:
Code: |
062E: MOVLW 07 ;init
062F: MOVWF 52 ;init
0630: NOP
0631: DECFSZ 52,F ;loop ...
0632: GOTO 630 ; 3 cycle ... ok
|
and the c code is:
Code: |
i=7;
do
#asm
nop
#endasm
while (--i!=0);
|
my initialisation is probable false (i=niter+2)
Bye. |
|
|
|