|
|
View previous topic :: View next topic |
Author |
Message |
bamby
Joined: 31 Oct 2019 Posts: 2
|
memset efficiency improvement ? |
Posted: Thu Oct 31, 2019 10:10 am |
|
|
hi all
using dsPIC33FJ64GS610
CCS 5.073
trying to clear an array:
Code: |
int16 result[16][16];
memset(result, 0, sizeof(result));
|
results in the following assembly:
Code: |
MOV #1366,W1
MOV #0,W2
REPEAT #1FF
CLR.B [W1++]
|
which means clearing the RAM byte by byte, 512 times
any suggestions how to make it work word by word, 256 times ? |
|
|
jeremiah
Joined: 20 Jul 2010 Posts: 1346
|
|
Posted: Thu Oct 31, 2019 10:49 am |
|
|
You can make it more efficient for you and "in general", but making it more efficient always is not possible without having specific memset() operations for all potential cases and you manually choosing the right one.
What a lot of other platforms do is look at the address and length of the destination and see how it lines up with the alignment of the processor, and then pick an algorithm based on it. For example, if result were an 8 bit array and started at an odd address, then word by word copies might cause a trap interrupt to occur on some chips (word operation on an odd address). so to accommodate this, the algorithm would check to see if the address is odd or even, then if it were odd, do a byte by byte copy, and if it were even, do a word by word copy. Not done yet though. If the length of the array were odd then it would need logic to detect that and only do word by word until then end and then do a quick byte copy. Still not done though because it is often times faster to do loop unrolling, so they might also throw in additional logic to see if the array is big enough to do the copying using an unrolled loop for part of the copy.
The end result is that when you have an array that starts on an even address and is larger than a specific length, the copying is faster. Otherwise, all the logic checks make the other scenarios slower.
The other trade off is you just replaced 4 lines of assembly with 100s of lines of assembly, so there is also a space tradeoff to consider.
Side note: for memcpy() it is even more convoluted because all the same questions apply to the source array as well as the destination array AND you have to account for things like when one starts on an odd address and the other starts on an even address among other things.
If you are interested, this has nothing to do with PICs specifically, but all types of processors. It is a link to some attempts to benchmark optimizing memcpy():
https://www.embedded.com/optimizing-memcpy-improves-speed/ |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9225 Location: Greensville,Ontario
|
|
Posted: Thu Oct 31, 2019 11:29 am |
|
|
While I don't use that PIC, you could hardcode in assembly to clear 256 words, as that PIC is 'word' based.
Instead of using memset(), try a for(....) loop that puts 0 into each element of the array. The compiler may code it 'better/faster' than you memset() method.
If I could I'd code as it's raining here for the next day or so.... |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Thu Oct 31, 2019 1:50 pm |
|
|
Yes, the problem is that they have designed the memset to be generic,
so it has to be able to cope with odd numbers of bytes. Hence byte based.
They perhaps need to code a word_memset function. Not hard to code
actually. If I'm feeling bored tomorrow, will try to put a version together. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Fri Nov 01, 2019 2:05 am |
|
|
OK. Crude 'word_clear' function. Only sets to 0, and needs the buffer
to be word aligned:
Code: |
#inline
//16bit memset routine. Needs to be called with a 16bit aligned start address
//and a word count
void word_clear(unsigned int16 * address, int count)
{
#asm
mov address,W1
mov count, W2
repeat W2
clr [w1++]
#endasm
}
char buffer[512] __attribute__((aligned(2)));
//then call as.
word_clear((unsigned int16 *)buffer, 256); //beware the starting pointer must be word aligned
|
16bit variables, the compiler will always word align by default, but char
variables it won't. So you need to ensure the buffer is word aligned to use
this. Hence the byte version that CCS use... |
|
|
bamby
Joined: 31 Oct 2019 Posts: 2
|
|
Posted: Mon Nov 04, 2019 12:43 am |
|
|
Thanks, Ttelmah
looks perfect |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|