CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

Super fast addition?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

Super fast addition?
PostPosted: Mon Jul 16, 2018 10:19 am     Reply with quote

I know the title may be a bit illusive. But what I am doing is I need to process 512 words of info in a loop.

Processor 24EP256GP206 running at 140MHz

Right now I am not running a loop because writing out the individual lines is way faster.

variable[0] += variable2[0];

That is what I am doing 512 times. variable and variable2 are global because they are used elsewhere. To do this 512 times takes 23uS. If it is done in a for loop it takes 175uS. But the issue is that I have to process variable2 before it gets added. I need to add a factor to it. Now these variables are signed int16's. I need to factor them like this:

float factor = 1.14;
float result;
result = variable2[0] * factor;
variable[0] += result;

The factor will change all the time and this process gets done all the time. Every 23mS to be exact. The above statement took like 7mS to do 512 times or something, i cant remember exactly, to process. Which is way, way too long. Now would changing it to a signed INT32 then making the factor an INT8 or something be way better? Or even using pointers?
Ideally i would like to be under 200uS.

I thought I would ask here because you guys are phenomenal at knowing which way is the best. Thank you in advance.
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 11:18 am     Reply with quote

I found the post where PCM shows how to use the simulator in MPLAB. I'm going to run a bunch of simulations in the meantime. I also read in another thread where there were some options about using int32 but only using the top bytes. Not sure I understand that. Any light would help. I need this to be as fast as possible.

Sorry that this is a dumb question anyways.
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 1:44 pm     Reply with quote

Ok this is where I am at with the SIM. Normally I would just use floats cause i dont care about speed and thus I am having issues with the math and such here.

Let me know what you think. The math does not seem to be coming out right. At least on the sim.

Code:
//
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT                    //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)//need to stay at 16MHz or the interupt for the comms does not work

void main(){
   signed int16 variable[512];
   signed int16 variable2[512];
   
   setup_oscillator(OSC_INTERNAL,140000000);
   
   variable[0] = 6000;
   variable2[0] = 1000;
   while(1){
      unsigned int16 i;
      unsigned int16 p =  1.14 * 256;
      unsigned int32 w;
      for(i=0;i<512;i++){ //total of 277uS  looping adds about 0.11uS each loop
         w = variable2[i] * p;//      0.21uS
         w >>=8; //      0.1uS
         variable[i] += w;//      0.15uS
      }    
   }
}
temtronic



Joined: 01 Jul 2010
Posts: 9229
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 2:29 pm     Reply with quote

comment.
you should post a few of the results that you get vs what you expect, as well as the interim values....

Jay
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 3:08 pm     Reply with quote

I put the actual SIM numbers beside each line. I really was hoping for under 200uS but it looks like that is not even possible. The big one is converting from the int16 to int32 for the math. I also will have to do some error checking in each one as i cant go over 32767 or under 32767. I only have the positive error check in there right now.

I also need to figure out how to do the math with negative numbers because I cant do a bit shift like I am with the positive numbers. Unless I am missing something.

I have to run this routine 3 times for each 23mS between SD card reads. I also have alot of other stuff going on so that is why the shorter the better. I am using the DMA to write to the codec so that is not getting in the way at all now.

So any suggestions would be awesome. I can try them on the SIM to see pretty quick.

Code:

//
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT                    //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)
void main(){
   signed int16 variable[512];
   signed int16 variable2[512];
   
   setup_oscillator(OSC_INTERNAL,140000000);
   
   variable[0] = 6000;
   variable2[0] = 32000;
   while(1){
      unsigned int16 i;
      unsigned int16 p =  0.06 * 128;//1 = 100%
      unsigned int32 w;
      for(i=0;i<512;i++){ //total of 695uS
         w = variable2[i];//      0.17uS
         w *= p;//                     0.67uS
         w >>=8; //                   0.1uS
         w +=variable[i];//        0.2uS
         if(w > 32767)//0.27uS
            w=32767;//0.07uS
         variable[i] = w;//0.07uS
      }    
   }
}
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 3:59 pm     Reply with quote

Here is the latest with handling negative numbers as well. This one is 944uS.

Let me know if you can see any efficiencies.

Code:
//
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT                    //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)
void main(){
   signed int16 variable[512];
   signed int16 variable2[512];
   
   setup_oscillator(OSC_INTERNAL,140000000);
   
   variable[0] = 32000;
   variable2[0] = -32000;
   while(1){
      unsigned int16 i;
      int1 neg = 0;
      unsigned int16 p =  0.06 * 128;//1 = 100%
      signed int32 w;
      for(i=0;i<512;i++){ //total of 944uS
         w = abs(variable2[i]);//      0.17uS
         w *= p;//                     0.67uS
         if(variable2[i] < 0)
            neg = 1;
         w >>=8; //                   0.1uS
         if(neg)
            w = 0xFFFFFFFF - w;//turn it back negative again if it was.
         w +=variable[i];//0.2uS
         if(w > 32767)//0.27uS
            w=32767;//0.07uS
         if(w < -32767)//0.27uS
            w = -32767;//0.07uS
         variable[i] = w;//0.07uS

      }    
   }
}
temtronic



Joined: 01 Jul 2010
Posts: 9229
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 4:59 pm     Reply with quote

just an idea...
instead of this...

*** if(variable2[i] < 0)
*** neg = 1;
w >>=8; // 0.1uS
*** if(neg)
*** w = 0xFFFFFFFF - w;//turn it back negative again if it was.
w +=variable[i];//0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
if(w < -32767)//0.27uS
w = -32767;//0.07uS

if(variable2[i] < 0)
w = 0xFFFFFFFF - w;//turn it back negative again if it was.

w >>=8; // 0.1uS

w +=variable[i];//0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
if(w < -32767)//0.27uS
w = -32767;//0.07uS
variable[i] = w;//0.07uS

Only the first statement after an IF() gets executed, so my thinking is you can eliminate the settting of the neg variable and the later test.
If I'm correct it should speed up the overall process.
If I'm wrong, well, it's 90*F in the shade and drier than the desert here, sorry, my brain's fried !
Jay
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 6:27 pm     Reply with quote

curt2go wrote:
Let me know if you can see any efficiencies.

If were you, I would look at the .LST file and look for any lines of C code
that produce an excessive amount of ASM code for what they do.
Then think of some clever method to re-write the code that produces a
much smaller .LST file for it. This is assuming it's all inline code.
By small, I don't mean to use loops. I mean, with it unrolled.
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 6:32 pm     Reply with quote

Yeh. The version i use now is all unrolled. Its takes up more ROM but i have that space to do so. The biggest ones are the converting to INT32 i do see. But not sure how I can get around that as I need the larger numbers instead of using floats. I will take a look more into the LST file and see where I can do some stuff.

And Temtronic that is a good idea. I will try that one. Since the data is probably half negative only It will cut down the time on the negative portion for sure.
Ttelmah



Joined: 11 Mar 2010
Posts: 19518

View user's profile Send private message

PostPosted: Mon Jul 16, 2018 10:49 pm     Reply with quote

Be aware:

w >>=8; //

Only gives /256, for a +ve number. Not -ve.

Look at:
<https://en.wikipedia.org/wiki/Arithmetic_shift>
Look at the section on 'Non-equivalence of arithmetic right shift and division'.

Don't do scaling like this.

If you want to use an integer factor, use int32 arithmetic. Multiply the factor by 65536, rather than 256. Then take the upper two bytes of the result as the int16 value. This can be done efficiently using a union.
Code:

union {
   signed int32 wrapper;
   signed int16 parts[2];
} value;

signed int32 scale=1.14*65536;

value.wrapper *=scale;
result = value.parts[1];

This gives you the integer put into 'value.wrapper', multiplied by 1.14 as an int16 result in value.parts[1].

Some time ago, I needed fast scaling for a servo application, so I wrote custom int24 basic arithmetic routines, arranged so they used the upper three bytes of an int32, then took the upper two bytes of this as the int16 result for the same effect. On a PIC18, without hardware division, this gave a significant saving over using int32, however on the PIC24, the int32 should give you quite good results without this complexity.
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Tue Jul 17, 2018 10:48 am     Reply with quote

That's is a very cool and efficient solution!

It saves some time which is awesome.

But what might be the best way to check for min and max values doing the math this way? I need to add variable[x] += variable2[x]; But the min and max is to be 32767 and -32767.

One weird thing in the simulator the math is always coming out double.
For instance if i use -1000 and do the math with 0.14*65536 as the scale the math should come out with -140 as the answer. But it is always double in this case its -280. I have just assumed its something in the SIM? Any thoughts?


This is the new math in the SIM it is cutting out 200uS so far.
Code:

union {
   signed int32 wrapper;
   signed int16 parts[2];
} value;

signed int32 scale=0.14*65536;

while(1){
unsigned int16 i;
   
   
for(i=0;i<512;i++){ //total of 775uS
   value.wrapper = variable2[i];//0.085uS
   value.wrapper *=scale; //0.67uS
   value.wrapper = value.parts[1];
   value.wrapper += variable[i];
   if(value.wrapper > 32767)//0.06uS
      value.wrapper=32767;//0.07uS
   if(value.wrapper < -32767)//0.06uS
      value.wrapper = -32767;//0.07uS
   variable[i] = value.wrapper;//0.07uS   
}
}
Ttelmah



Joined: 11 Mar 2010
Posts: 19518

View user's profile Send private message

PostPosted: Tue Jul 17, 2018 11:14 am     Reply with quote

I'd be worried about this:

value.wrapper = value.parts[1];

Remember value.parts, is 'part' of wrapper. This is putting part of a number back into the same RAM area. No idea quite what the effect would actually be!... Suspect the compiler may be having a hiccup on this which is resulting in the doubling.
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Tue Jul 17, 2018 11:35 am     Reply with quote

It was doubling before I was using this math. It does the same thing here.

variable2 = 6000;

6000 *0.14 = 840 but it comes out with 1679
Code:

value.wrapper = variable2[i];
value.wrapper *=scale;
p = value.parts[1];
p += variable[i];
//checkLimits();
if(p > 32767)//0.06uS
   p=32767;//0.07uS
if(p < -32767)//0.06uS
   p = -32767;//0.07uS
variable[i] = p;//0.07uS
curt2go



Joined: 21 Nov 2003
Posts: 200

View user's profile Send private message

PostPosted: Tue Jul 17, 2018 11:49 am     Reply with quote

If i use 32768 in the scale then I come out with the right number.
Ttelmah



Joined: 11 Mar 2010
Posts: 19518

View user's profile Send private message

PostPosted: Tue Jul 17, 2018 2:08 pm     Reply with quote

Just stuck a basic program together and run it up in a different PIC, and it works fine:
Code:

void main()
{
   int16 source;
   union {
      signed int32 wrapper;
      signed int16 parts[2];
   } value;

   signed int32 scale=0.14*65536;
   
   for (source=-500; source<600;source+=100)
   {  //Basic loop to test scaling.
      value.wrapper=source;
      printf("%5d  ", source);
      value.wrapper *=scale;
      printf("%05d\r",value.parts[1]);
   } 

   while(TRUE)
      ;
}


Gives (on terminal):
Quote:

-500 -0070
-400 -0056
-300 -0042
-200 -0028
-100 -0014
0 00000
100 00013
200 00027
300 00041
400 00055
500 00069


You either have a problem in your debugging environment, or something really screwy going on!...
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group