CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

memset efficiency improvement ?

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
bamby



Joined: 31 Oct 2019
Posts: 2

View user's profile Send private message

memset efficiency improvement ?
PostPosted: Thu Oct 31, 2019 10:10 am     Reply with quote

hi all
using dsPIC33FJ64GS610
CCS 5.073


trying to clear an array:
Code:

int16 result[16][16];
memset(result,  0, sizeof(result));


results in the following assembly:
Code:

MOV     #1366,W1
MOV     #0,W2
REPEAT  #1FF
CLR.B   [W1++]


which means clearing the RAM byte by byte, 512 times

any suggestions how to make it work word by word, 256 times ?
jeremiah



Joined: 20 Jul 2010
Posts: 1346

View user's profile Send private message

PostPosted: Thu Oct 31, 2019 10:49 am     Reply with quote

You can make it more efficient for you and "in general", but making it more efficient always is not possible without having specific memset() operations for all potential cases and you manually choosing the right one.

What a lot of other platforms do is look at the address and length of the destination and see how it lines up with the alignment of the processor, and then pick an algorithm based on it. For example, if result were an 8 bit array and started at an odd address, then word by word copies might cause a trap interrupt to occur on some chips (word operation on an odd address). so to accommodate this, the algorithm would check to see if the address is odd or even, then if it were odd, do a byte by byte copy, and if it were even, do a word by word copy. Not done yet though. If the length of the array were odd then it would need logic to detect that and only do word by word until then end and then do a quick byte copy. Still not done though because it is often times faster to do loop unrolling, so they might also throw in additional logic to see if the array is big enough to do the copying using an unrolled loop for part of the copy.

The end result is that when you have an array that starts on an even address and is larger than a specific length, the copying is faster. Otherwise, all the logic checks make the other scenarios slower.

The other trade off is you just replaced 4 lines of assembly with 100s of lines of assembly, so there is also a space tradeoff to consider.

Side note: for memcpy() it is even more convoluted because all the same questions apply to the source array as well as the destination array AND you have to account for things like when one starts on an odd address and the other starts on an even address among other things.

If you are interested, this has nothing to do with PICs specifically, but all types of processors. It is a link to some attempts to benchmark optimizing memcpy():
https://www.embedded.com/optimizing-memcpy-improves-speed/
temtronic



Joined: 01 Jul 2010
Posts: 9225
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Thu Oct 31, 2019 11:29 am     Reply with quote

While I don't use that PIC, you could hardcode in assembly to clear 256 words, as that PIC is 'word' based.

Instead of using memset(), try a for(....) loop that puts 0 into each element of the array. The compiler may code it 'better/faster' than you memset() method.

If I could I'd code as it's raining here for the next day or so....
Ttelmah



Joined: 11 Mar 2010
Posts: 19504

View user's profile Send private message

PostPosted: Thu Oct 31, 2019 1:50 pm     Reply with quote

Yes, the problem is that they have designed the memset to be generic,
so it has to be able to cope with odd numbers of bytes. Hence byte based.
They perhaps need to code a word_memset function. Not hard to code
actually. If I'm feeling bored tomorrow, will try to put a version together.
Ttelmah



Joined: 11 Mar 2010
Posts: 19504

View user's profile Send private message

PostPosted: Fri Nov 01, 2019 2:05 am     Reply with quote

OK. Crude 'word_clear' function. Only sets to 0, and needs the buffer
to be word aligned:
Code:

#inline
//16bit memset routine. Needs to be called with a 16bit aligned start address
//and a word count
void word_clear(unsigned int16 * address, int count)
{
#asm
   mov address,W1
   mov count, W2
   repeat W2
   clr [w1++]
#endasm
}

char buffer[512]  __attribute__((aligned(2)));

//then call as.
word_clear((unsigned int16 *)buffer, 256); //beware the starting pointer must be word aligned


16bit variables, the compiler will always word align by default, but char
variables it won't. So you need to ensure the buffer is word aligned to use
this. Hence the byte version that CCS use...
bamby



Joined: 31 Oct 2019
Posts: 2

View user's profile Send private message

PostPosted: Mon Nov 04, 2019 12:43 am     Reply with quote

Thanks, Ttelmah
looks perfect Smile
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group