|
|
View previous topic :: View next topic |
Author |
Message |
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
Optimising code |
Posted: Tue Mar 04, 2008 1:47 am |
|
|
I am using a PIC18F4685 but I have a very large program and am short on code space! I'm therefore trying to optimise things a bit.
I have quite a lot of int32 multiplies followed by divides to do scaling (eg x = y * z / q) therefore I thought I would save a lot of space if I called a routine to do this each time rather than effectively repeating the same code each time. Unfortunately it seems that I use very nearly the same amount of code space either way.
Is there something better I can do with this (or suggestions for other code optimisation tricks for that matter)?
inline multiply and divide
Code: | ... calib_parameters->oil_flow_rate = (oil_measured_flow * 5000) / 300;
03F74: MOVLW 10
03F76: MOVLB 6
03F78: ADDWF x7B,W
03F7A: MOVWF FE9
03F7C: MOVLW 00
03F7E: ADDWFC x7C,W
03F80: MOVWF FEA
03F82: MOVFF FEA,680
03F86: MOVFF FE9,67F
03F8A: MOVFF E7,889
03F8E: MOVFF E6,888
03F92: MOVFF E5,887
03F96: MOVFF E4,886
03F9A: MOVLB 8
03F9C: CLRF x8D
03F9E: CLRF x8C
03FA0: MOVLW 13
03FA2: MOVWF x8B
03FA4: MOVLW 88
03FA6: MOVWF x8A
03FA8: MOVLB 0
03FAA: CALL 0FD4
03FAE: MOVFF 680,FEA
03FB2: MOVFF 67F,FE9
03FB6: MOVFF 03,684
03FBA: MOVFF 02,683
03FBE: MOVFF 01,682
03FC2: MOVFF 00,681
03FC6: MOVFF FEA,686
03FCA: MOVFF FE9,685
03FCE: MOVFF 03,88B
03FD2: MOVFF 02,88A
03FD6: MOVFF 01,889
03FDA: MOVFF 00,888
03FDE: MOVLB 8
03FE0: CLRF x8F
03FE2: CLRF x8E
03FE4: MOVLW 01
03FE6: MOVWF x8D
03FE8: MOVLW 2C
03FEA: MOVWF x8C
03FEC: MOVLB 0
03FEE: RCALL 3E5C
03FF0: MOVFF 686,FEA
03FF4: MOVFF 685,FE9
03FF8: MOVFF 00,FEF
03FFC: MOVFF 01,FEC
04000: MOVFF 02,FEC
04004: MOVFF 03,FEC
|
alternative - call a function to perform multiply and divide
Code: | ... calib_parameters->oil_flow_rate = math_imd(oil_measured_flow, 5000, 300);
03F74: MOVLW 10
03F76: MOVLB 6
03F78: ADDWF x7B,W
03F7A: MOVWF 01
03F7C: MOVLW 00
03F7E: ADDWFC x7C,W
03F80: MOVWF 03
03F82: MOVFF 01,67D
03F86: MOVWF x7E
03F88: MOVFF E7,689
03F8C: MOVFF E6,688
03F90: MOVFF E5,687
03F94: MOVFF E4,686
03F98: CLRF x8D
03F9A: CLRF x8C
03F9C: MOVLW 13
03F9E: MOVWF x8B
03FA0: MOVLW 88
03FA2: MOVWF x8A
03FA4: CLRF x91
03FA6: CLRF x90
03FA8: MOVLW 01
03FAA: MOVWF x8F
03FAC: MOVLW 2C
03FAE: MOVWF x8E
03FB0: MOVLB 0
03FB2: CALL 1106
03FB6: MOVFF 67E,FEA
03FBA: MOVFF 67D,FE9
03FBE: MOVFF 00,FEF
03FC2: MOVFF 01,FEC
03FC6: MOVFF 02,FEC
03FCA: MOVFF 03,FEC
|
math_imd codes as:
Code: |
... //---------------------------------------------------------
... s32bit math_imd(s32bit val, s32bit multiplier, s32bit divisor)
... //---------------------------------------------------------
... {
... return (val * multiplier) / divisor;
*
01106: MOVFF 689,889
0110A: MOVFF 688,888
0110E: MOVFF 687,887
01112: MOVFF 686,886
01116: MOVFF 68D,88D
0111A: MOVFF 68C,88C
0111E: MOVFF 68B,88B
01122: MOVFF 68A,88A
01126: RCALL 0FD4
01128: MOVFF 03,695
0112C: MOVFF 02,694
01130: MOVFF 01,693
01134: MOVFF 00,692
01138: MOVFF 03,69B
0113C: MOVFF 02,69A
01140: MOVFF 01,699
01144: MOVFF 00,698
01148: MOVFF 691,69F
0114C: MOVFF 690,69E
01150: MOVFF 68F,69D
01154: MOVFF 68E,69C
01158: RCALL 1030
.................... }
0115A: RETLW 00
|
|
|
|
Pret
Joined: 18 Jul 2006 Posts: 92 Location: Iasi, Romania
|
|
Posted: Tue Mar 04, 2008 2:12 am |
|
|
With some versions of CCS, Code: | (*calib_parameters).oil_flow_rate | is better than Code: | calib_parameters->oil_flow_rate |
Another thing. How about: Code: | (oil_measured_flow * 50) / 3 | Or if your result requires speed more than precision, you can try Code: | oil_measured_flow*16 + oil_measured_flow/2 | which can be translated in Code: | oil_measured_flow<<4 + oil_measured_flow>>1 |
Hope it helps... |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Tue Mar 04, 2008 2:50 am |
|
|
Thanks for your reply Pret
Pret wrote: | With some versions of CCS, Code: | (*calib_parameters).oil_flow_rate | is better than Code: | calib_parameters->oil_flow_rate |
|
I was not aware there was any difference there but just tried it and it definitely does save code space (saves 6 bytes using 4.063).
Edit: Just tried this in other places and it takes more space - strange
Pret wrote: | Another thing. How about: Code: | (oil_measured_flow * 50) / 3 |
|
Good point. This does save another 4 bytes. Not all of my code will have such nice numbers but it is at least something I can check through and improve where possible.
Pret wrote: | Or if your result requires speed more than precision, you can try Code: | oil_measured_flow*16 + oil_measured_flow/2 | which can be translated in Code: | oil_measured_flow<<4 + oil_measured_flow>>1 |
Hope it helps... |
Nice idea but accuracy is important. Will bear it in mind though and use where possible.
Thanks for your help. |
|
|
Ttelmah Guest
|
|
Posted: Tue Mar 04, 2008 3:18 am |
|
|
The reason for the small improvement, is that the compiler is already using a generic 'divide' routine in the original code.
One thought, is to evaluate the sum as:
calib_parameters->oil_flow_rate = (oil_measured_flow * 4267) / 256;
This gives the same result to better than 4 decimals, yet will evaluate much faster (the compiler is smart enough to know that it can perform /256, by shifting one byte right).
Best Wishes |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Tue Mar 04, 2008 3:43 am |
|
|
Thanks Ttelmah,
Ttelmah wrote: | The reason for the small improvement, is that the compiler is already using a generic 'divide' routine in the original code. |
Is it likely that I could improve over the generic divide by using my custom multiply and divide routine implemented in assembler since I know I always want to multiply and then divide?
Ttelmah wrote: | One thought, is to evaluate the sum as:
calib_parameters->oil_flow_rate = (oil_measured_flow * 4267) / 256;
This gives the same result to better than 4 decimals, yet will evaluate much faster (the compiler is smart enough to know that it can perform /256, by shifting one byte right). |
Just tried that out - It does save space compared to the original code however it does not save as much as calling my math_imd routine when using the 50 / 3 numbers.
Thanks for your suggestions
Edit:
I also have a lot of sprintf to format data which I send to an LCD - can I improve these:
Code: | sprintf(&buffer[0], "%cZL%c%c%lu", 0x1B, x, y, oil_measured_flow);
|
|
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Tue Mar 04, 2008 6:30 am |
|
|
The 32 bit division + multiply requires a lot of code space but this is about as good as it gets. The assembly code you show us is mostly for storing and retrieving the 32-bit parameters before the general multiply and divide routines are called.
From the small code fragments you show us it is difficult to give other optimization tips. Maybe there are other parts in your code taking a lot of space? Check the list file for this, especially printf lines can be expensive (hidden by a new subroutine call for every line).
Also consider another approach for your arithmetic. Do you really need 32-bit precision? Can you do the scaling only once, for example at start or end? |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Tue Mar 04, 2008 7:05 am |
|
|
Thanks ckielstra,
ckielstra wrote: | The 32 bit division + multiply requires a lot of code space but this is about as good as it gets. The assembly code you show us is mostly for storing and retrieving the 32-bit parameters before the general multiply and divide routines are called. |
Thought that would be the case. I was wondering whether I could improve on it by coding the multiply and divide myself since I can leave results in specific registers however if I have refactored the code to use my math_imd routine then I would not save anything significant anyway.
ckielstra wrote: | From the small code fragments you show us it is difficult to give other optimization tips. Maybe there are other parts in your code taking a lot of space? Check the list file for this, especially printf lines can be expensive (hidden by a new subroutine call for every line). |
Yes, the sprintf line that I show above takes 76 bytes and I have lots of these - some with more parameters and some with less. I am using an LCD where I send it data in a certain protocol over I2C - I therefore have to format what I wish to send first. If I could improve on sprintf that would help a lot. I never need to format floats so I wondered whether if I coded my own sprintf it would be better.
ckielstra wrote: | Also consider another approach for your arithmetic. Do you really need 32-bit precision? Can you do the scaling only once, for example at start or end? |
I'm using long scaled integer arithmatic to avoid using floating point variables. I may be able to improve things further though.
Thanks for your comments. |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Tue Mar 04, 2008 7:27 am |
|
|
Here an example on the difference in code size between sprintf and manual coding:
Code: | void main()
{
int32 oil_measured_flow;
int8 x,y;
char buffer[20];
oil_measured_flow = 0x12345678;
// sprintf takes 58 bytes + 5 calls to other functions
sprintf(&buffer[0], "%cZL%c%c%lu", 0x1B, x, y, oil_measured_flow);
// Manual code example below takes only 30 bytes and no function calls.
buffer[0] = 0x1B;
buffer[1] = x;
buffer[2] = y;
buffer[3] = make8( oil_measured_flow, 0); // Note: byte sequence here is not equal to the sprintf.
buffer[4] = make8( oil_measured_flow, 1);
buffer[5] = make8( oil_measured_flow, 2);
buffer[6] = make8( oil_measured_flow, 3);
buffer[7] = 0;
} |
|
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Tue Mar 04, 2008 8:00 am |
|
|
ckielstra wrote: | Here an example on the difference in code size between sprintf and manual coding:
Code: | void main()
{
snip
} |
|
Wow
Thank you very much for doing that - it is like a slap in the face to notice how much can be saved so easily.
I have just replaced one instance of it (including the ZL) and it saved 46 bytes. A quick check shows that I have 138 calls to sprintf so based on that I should be able to save around 6.7% code space!!!
Since I need similar code 138 times, do you think it is worth figuring out CCS variable parameter lists and implementing it as a function that I call 138 times or simply to code it inline as you have shown? Perhaps I would not know the answer until I tried it out. Quite a lot of times the number and type of parameters are the same so I could have one function to cover that option and code the rest inline.
Many thanks indeed |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Tue Mar 04, 2008 8:12 am |
|
|
138 variations on the same theme sounds like an opportunity for optimization. Take note that pointer arithmetic is using a lot of code space in the PIC processor, so try to avoid passing variables as pointers.
To find the best solution will require some testing. In tweaking code there is no sure way to predict the most optimal solution. |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Tue Mar 04, 2008 8:32 am |
|
|
ckielstra wrote: | 138 variations on the same theme sounds like an opportunity for optimization. Take note that pointer arithmetic is using a lot of code space in the PIC processor, so try to avoid passing variables as pointers.
To find the best solution will require some testing. In tweaking code there is no sure way to predict the most optimal solution. |
More useful tips - I do tend to pass pointers generally so will have to be careful of that in future.
Thanks again |
|
|
Ken Johnson
Joined: 23 Mar 2006 Posts: 197 Location: Lewisburg, WV
|
|
Posted: Tue Mar 04, 2008 8:41 am |
|
|
"I'm using long scaled integer arithmatic to avoid using floating point variables."
Why?
A lot of folks here disagree with me on this, but I use floats a lot - makes code much simpler and more readable (maintainable). Yes, there are instances where the speed penalty comes into play, but . . .
Look at the project requirements, rather than just saying "Don't use floats"
Ok, there's 2 cents worth, which may not be worth that much
Ken |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Wed Mar 05, 2008 1:59 am |
|
|
Ken Johnson wrote: | A lot of folks here disagree with me on this, but I use floats a lot - makes code much simpler and more readable (maintainable). Yes, there are instances where the speed penalty comes into play, but . . . |
Hi Ken,
Adding two floats takes more code than adding two int32s.
I've realised that my earlier enthusiasm for the simplification that ckielstra suggested was a bit misguided. For example:
Code: | sprintf(&buffer[0], "%cZL%c%c%lu", 0x1B, x, y, oil_measured_flow); |
oil_measured_flow needs to be sent to the LCD as a string rather than 4 bytes so I still have a problem.
Another example:
Code: | sprintf(&buffer[0], "%cZL%c%c%02u/%02u/%02u", 0x1B, ScreenCentreX + 5, TextLine2T, cal_date.day, cal_date.month, cal_date.year); |
where cal_date.day is a byte that I want to display as 02 etc
I guess I could code it as:
Code: |
//buffer[0] = 0x1B;
//buffer[1] = 'Z';
//buffer[2] = 'L';
//buffer[3] = ScreenCentreX + 5;
//buffer[4] = TextLine2T;
//buffer[5] = '0' + (cal_date.day / 10);
//buffer[6] = '0' + (cal_date.day % 10);
//buffer[7] = '/';
//buffer[8] = '0' + (cal_date.month / 10);
//buffer[9] = '0' + (cal_date.month % 10);
//buffer[10] = '/';
//buffer[11] = '0' + (cal_date.year / 10);
//buffer[12] = '0' + (cal_date.year % 10);
//buffer[13] = 0;
|
but that takes more code space (144 bytes) compared to the sprintf (126 bytes)
Am I missing something obvious?? |
|
|
Ttelmah Guest
|
|
Posted: Wed Mar 05, 2008 3:42 am |
|
|
Ages ago, I posted here, a 'demo' routine showing a more efficient way of doing the arithmetic for this. Cannot remember what the thread was about!. Basically, when you perform an /10, the remainder, is available in one of the compiler's temporary variables, and with a bit of ingenuity, it is possible to retrieve this, saving having to perform a second operation to generate the '%10' value. I'd suspect that possibly 'sprintf', may actually be doing something like this.
Best Wishes |
|
|
Martin Berriman
Joined: 08 Dec 2005 Posts: 66 Location: UK
|
|
Posted: Wed Mar 05, 2008 3:47 am |
|
|
Ttelmah wrote: | Ages ago, I posted here, a 'demo' routine showing a more efficient way of doing the arithmetic for this. Cannot remember what the thread was about!. Basically, when you perform an /10, the remainder, is available in one of the compiler's temporary variables, and with a bit of ingenuity, it is possible to retrieve this, saving having to perform a second operation to generate the '%10' value. I'd suspect that possibly 'sprintf', may actually be doing something like this. |
Thanks Ttelmah, I will search for it.
Edit: Not found it yet - I have found something that might be along similar lines though (http://www.ccsinfo.com/forum/viewtopic.php?p=53435#53435).
Edit2: Just came across another of your posts saying that switch statements with defaults take more code - removed a few of them (since I handle all options anyway) and it saves me another 262 bytes! |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|