Bizarre! minus sign included in float add result (PCH)

Kurt Franke · Guest

Hi all,

PCH 3.129, pic18f452

I'm getting some very bizarre behavior right now.
There is a certain place in my code that looks like:

float a,b;
...
printf ("\%f \%f\r\n", a,b);
a += b;
printf ("\%f \%f\r\n", a,b);
...

The output I see is something like:
233.45 2.54
-235.99 2.54

--> In short, I get a = -(a+b)!!!

I've tried my code on a different circuit board.

I've tried a = a+b; (without the +=)

I've tried a += b; a = -a; but somehow I still end up with the wrong thing!

I've tried temp = a+b; a = temp;

I've even tried the very humorous do-it-till-you-get-it-right construct:
do { temp = a+b; } until (temp == a+b); a = temp;

None of these change the (bad) result in a.

The assembly looks fine. Interrupts are running but make
no use of a or b.

I tried taking out the printf's (perhaps they are changing their arguments??) but the behaviour of my system was still
the same (it acts like a is being calculated false).

I can't take abs(a) because a _should_ be negative in some
cases.

I can't switch to fixed point math without a fixed point
log, sin, cos, atan2, etc. which I use elsewhere.

I've tried pch 3.110, 3.124, 3.129

So....
Is anyone else experiencing problems with floating point
calculations?

I will see if I can isolate this to a small test program.

Thank you,

Kurt Franke
___________________________
This message was ported from CCS's old forum
Original Post ID: 10142

Tomi · Guest

It's really funny Smile

I think your problem is about the float number representation vs. addition. To make a correct addition, the PIC must normalize the numbers to common exponent (typically "shift right-increment exponent" until it reaches the bigger number's exponent). The carry bit (or other wrong bit) could enters into MSB of mantissa to give a negative number.
Maybe you can resolve the problem with this:
b = 162.56; // 2.54*64 instead of 2.54
a += b/64;

I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.

:=Hi all,
:=
:=PCH 3.129, pic18f452
:=
:=I'm getting some very bizarre behavior right now.
:=There is a certain place in my code that looks like:
:=
:=float a,b;
:=...
:=printf ("\%f \%f\r\n", a,b);
:=a += b;
:=printf ("\%f \%f\r\n", a,b);
:=...
:=
:=The output I see is something like:
:=233.45 2.54
:=-235.99 2.54
:=
:=--> In short, I get a = -(a+b)!!!
:=
:=I've tried my code on a different circuit board.
:=
:=I've tried a = a+b; (without the +=)
:=
:=I've tried a += b; a = -a; but somehow I still end up with the wrong thing!
:=
:=I've tried temp = a+b; a = temp;
:=
:=I've even tried the very humorous do-it-till-you-get-it-right construct:
:=do { temp = a+b; } until (temp == a+b); a = temp;
:=
:=None of these change the (bad) result in a.
:=
:=The assembly looks fine. Interrupts are running but make
:=no use of a or b.
:=
:=I tried taking out the printf's (perhaps they are changing their arguments??) but the behaviour of my system was still
:=the same (it acts like a is being calculated false).
:=
:=I can't take abs(a) because a _should_ be negative in some
:=cases.
:=
:=I can't switch to fixed point math without a fixed point
:=log, sin, cos, atan2, etc. which I use elsewhere.
:=
:=I've tried pch 3.110, 3.124, 3.129
:=
:=So....
:=Is anyone else experiencing problems with floating point
:=calculations?
:=
:=I will see if I can isolate this to a small test program.
:=
:=Thank you,
:=
:=Kurt Franke
___________________________
This message was ported from CCS's old forum
Original Post ID: 10144

Kurt Franke · Guest

...
:=Maybe you can resolve the problem with this:
:=b = 162.56; // 2.54*64 instead of 2.54
:=a += b/64;
:=
:=I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.

Well, when I printed out the bin. representation of the
numbers it is failing on and put them into a test program
the addition is done correctly. I think this may be
some subtle compiler problem.. sigh.

I'll check what other variables use those locations..

Here are the numbers from the failed addition:

10000111 00100001 10011010 10111111 // a = 323.20895
10000000 01010111 00001010 00111101 // b = 3.3600001
10000111 10100011 01001000 11010011 // a+b (sign bit is set..)

-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10147

R.J.Hamlett · Guest

:=...
:=:=Maybe you can resolve the problem with this:
:=:=b = 162.56; // 2.54*64 instead of 2.54
:=:=a += b/64;
:=:=
:=:=I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.
:=
:=Well, when I printed out the bin. representation of the
:=numbers it is failing on and put them into a test program
:=the addition is done correctly. I think this may be
:=some subtle compiler problem.. sigh.
:=
:=I'll check what other variables use those locations..
:=
:=Here are the numbers from the failed addition:
:=
:=10000111 00100001 10011010 10111111 // a = 323.20895
:=10000000 01010111 00001010 00111101 // b = 3.3600001
:=10000111 10100011 01001000 11010011 // a+b (sign bit is set..)
:=
Check the .sym file, and see if anything in the interrupt code, could be accessing part of the scratch area. Also pay special attention to any use of pointers (remember there is no bounds checking, so an array stored immediately below an area like this, can 'overrun' into an important piece of memory...

Best Wishes
___________________________
This message was ported from CCS's old forum
Original Post ID: 10150

Tomi · Guest

I don't think that your variable is corrupted because it gives the right result bits-by-bits but the sign bit. What is the result of:

a -= -b; ? :)

:=...
:=:=Maybe you can resolve the problem with this:
:=:=b = 162.56; // 2.54*64 instead of 2.54
:=:=a += b/64;
:=:=
:=:=I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.
:=
:=Well, when I printed out the bin. representation of the
:=numbers it is failing on and put them into a test program
:=the addition is done correctly. I think this may be
:=some subtle compiler problem.. sigh.
:=
:=I'll check what other variables use those locations..
:=
:=Here are the numbers from the failed addition:
:=
:=10000111 00100001 10011010 10111111 // a = 323.20895
:=10000000 01010111 00001010 00111101 // b = 3.3600001
:=10000111 10100011 01001000 11010011 // a+b (sign bit is set..)
:=
:=-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10153

R.J.Hamlett · Guest

:=I don't think that your variable is corrupted because it gives the right result bits-by-bits but the sign bit. What is the result of:
:=
:=a -= -b; ? <img src="http://www.ccsinfo.com/pix/forum/smile.gif" border="0">

Personally, I suspect an interrupt is overwriting whatever location is being used to temporarily hold the flag bit, during the maths. I'd imagine that the arithmetic extracts this (these), normalises the arithmetic, performs the main calculation, then generates the resulting sign. As you say, the main 'scratch' variables are obviously left intact, but one bit is presumably being damaged.
The sort of thing I'd look at very carefully, would be if this code had been 'migrated' from an older chip, where some definitions for bits in the UART or SSP control registers, if they have not been relocated, could end up changing bits in the very area normally used for the scratch (the default scratch setting on the 18 family, is to start at address 0). An interrupt routine that changes such a bit, could then cause this behaviour.

Best Wishes

:=:=...
:=:=:=Maybe you can resolve the problem with this:
:=:=:=b = 162.56; // 2.54*64 instead of 2.54
:=:=:=a += b/64;
:=:=:=
:=:=:=I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.
:=:=
:=:=Well, when I printed out the bin. representation of the
:=:=numbers it is failing on and put them into a test program
:=:=the addition is done correctly. I think this may be
:=:=some subtle compiler problem.. sigh.
:=:=
:=:=I'll check what other variables use those locations..
:=:=
:=:=Here are the numbers from the failed addition:
:=:=
:=:=10000111 00100001 10011010 10111111 // a = 323.20895
:=:=10000000 01010111 00001010 00111101 // b = 3.3600001
:=:=10000111 10100011 01001000 11010011 // a+b (sign bit is set..)
:=:=
:=:=-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10156

nilsener · Guest

Dear,

hey, I am not alone with this problem. My math on PCWH3.110/3.129 and 18F452 fails nearly in the same way. I have something like this:

float current:
float delta;
float rate;
static float average;
float pressure;

current = get_pressure(); //works right
delta = current - average; // works sometimes right
average += (delta/rate); // fails, there is always an addition also if (delta/rate) is negative.
pressure = average;

I have tried many workarounds but nothing helps.

At last I solved the problem by making only the calculation with fixed point math and after that cast back to float, then you can use sin, cos etc. elsewhere. Choose the multiplier and divider (1000 in this case) in dependance of needed accuracy and max. range of your floats so that the signed int32 can not owerflow:

float pressure;
signed int32 current;
signed int32 delta;
signed int32 rate;
static signed int32 average;

pressure = get_pressure();
current = (signed int32)(pressure * 1000);
delta = current - average;
average += (delta/rate);
pressure = (float)(average) / 1000;

regards nilsener

:=Hi all,
:=
:=PCH 3.129, pic18f452
:=
:=I'm getting some very bizarre behavior right now.
:=There is a certain place in my code that looks like:
:=
:=float a,b;
:=...
:=printf ("\%f \%f\r\n", a,b);
:=a += b;
:=printf ("\%f \%f\r\n", a,b);
:=...
:=
:=The output I see is something like:
:=233.45 2.54
:=-235.99 2.54
:=
:=--> In short, I get a = -(a+b)!!!
:=
:=I've tried my code on a different circuit board.
:=
:=I've tried a = a+b; (without the +=)
:=
:=I've tried a += b; a = -a; but somehow I still end up with the wrong thing!
:=
:=I've tried temp = a+b; a = temp;
:=
:=I've even tried the very humorous do-it-till-you-get-it-right construct:
:=do { temp = a+b; } until (temp == a+b); a = temp;
:=
:=None of these change the (bad) result in a.
:=
:=The assembly looks fine. Interrupts are running but make
:=no use of a or b.
:=
:=I tried taking out the printf's (perhaps they are changing their arguments??) but the behaviour of my system was still
:=the same (it acts like a is being calculated false).
:=
:=I can't take abs(a) because a _should_ be negative in some
:=cases.
:=
:=I can't switch to fixed point math without a fixed point
:=log, sin, cos, atan2, etc. which I use elsewhere.
:=
:=I've tried pch 3.110, 3.124, 3.129
:=
:=So....
:=Is anyone else experiencing problems with floating point
:=calculations?
:=
:=I will see if I can isolate this to a small test program.
:=
:=Thank you,
:=
:=Kurt Franke
___________________________
This message was ported from CCS's old forum
Original Post ID: 10157

Tomi · Guest

It's easy to decide by inserting a "Restart_wdt(); Disable_Interrupts(GLOBAL);" instruction pair just before the math :)

:=Personally, I suspect an interrupt is overwriting whatever location is being used to temporarily hold the flag bit, during the maths. I'd imagine that the arithmetic extracts this (these), normalises the arithmetic, performs the main calculation, then generates the resulting sign. As you say, the main 'scratch' variables are obviously left intact, but one bit is presumably being damaged.
:=The sort of thing I'd look at very carefully, would be if this code had been 'migrated' from an older chip, where some definitions for bits in the UART or SSP control registers, if they have not been relocated, could end up changing bits in the very area normally used for the scratch (the default scratch setting on the 18 family, is to start at address 0). An interrupt routine that changes such a bit, could then cause this behaviour.
:=
:=Best Wishes
:=
:=:=:=...
:=:=:=:=Maybe you can resolve the problem with this:
:=:=:=:=b = 162.56; // 2.54*64 instead of 2.54
:=:=:=:=a += b/64;
:=:=:=:=
:=:=:=:=I hope the "/64" oper. simply decreases the exponent by 6 so the mantissa will be untouched.
:=:=:=
:=:=:=Well, when I printed out the bin. representation of the
:=:=:=numbers it is failing on and put them into a test program
:=:=:=the addition is done correctly. I think this may be
:=:=:=some subtle compiler problem.. sigh.
:=:=:=
:=:=:=I'll check what other variables use those locations..
:=:=:=
:=:=:=Here are the numbers from the failed addition:
:=:=:=
:=:=:=10000111 00100001 10011010 10111111 // a = 323.20895
:=:=:=10000000 01010111 00001010 00111101 // b = 3.3600001
:=:=:=10000111 10100011 01001000 11010011 // a+b (sign bit is set..)
:=:=:=
:=:=:=-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10159

Kurt Franke · Guest

:=It's easy to decide by inserting a "Restart_wdt(); Disable_Interrupts(GLOBAL);" instruction pair just before the math <img src="http://www.ccsinfo.com/pix/forum/smile.gif" border="0">
:=

Just so. When I disable interrupts before the calculation
and before printing out a and b (wdt is disabled) ...

... I get exactly the same result.

disable_interrupts(GLOBAL);
show_float(a); // prints out the 32 bits
show_float(b);
a+=b;
show_float(a);
enable_interrupts(GLOBAL);

=>No effect.

My result is not getting clobbered by an interrupt,
it is being calculated incorrectly. This is pure speculation,
but perhaps operator+(float,float) isn't initializing some of
its internal flags.

I'm afraid I'm having to look at other compilers now.

-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10164

R.J.Hamlett · Guest

:=:=It's easy to decide by inserting a "Restart_wdt(); Disable_Interrupts(GLOBAL);" instruction pair just before the math <img src="http://www.ccsinfo.com/pix/forum/smile.gif" border="0">
:=:=
:=
:=Just so. When I disable interrupts before the calculation
:=and before printing out a and b (wdt is disabled) ...
:=
:=... I get exactly the same result.
:=
:=disable_interrupts(GLOBAL);
:=show_float(a); // prints out the 32 bits
:=show_float(b);
:=a+=b;
:=show_float(a);
:=enable_interrupts(GLOBAL);
:=
:==>No effect.
:=
:=My result is not getting clobbered by an interrupt,
:=it is being calculated incorrectly. This is pure speculation,
:=but perhaps operator+(float,float) isn't initializing some of
:=its internal flags.
:=
:=I'm afraid I'm having to look at other compilers now.
I thought you said that a simple test, transferring the binary values to a smaller block of code, gave the correct result?.
I have a significant amount of arithmetic, subtracting odd values, multiplying with correction factors, and then applying a log, that all works fine. A sign error here would give ludicrous results (given the log...). This has now run in various versions for many weeks. An error of this sort would show immediately (high and low alarm outputs), especially since the ranges involved cover several decades of arithmetic, which imply the odds are that one system would have hit the problem somewhere.
Given this it does seem to imply an interaction, but given you have now tried with the interrupts disabled, the question is from what?. Are you pulling the binary values (that you show), before using the printf, or afterwards?. Have you kept the interrupts disabled up to this point?. (I am wondering if printf is interfering with a value, since I don't use this, but transfer the value using SPI).

Best Wishes
___________________________
This message was ported from CCS's old forum
Original Post ID: 10167

Chang-Huei Wu · Guest

<font face="Courier New" size=-1>All the following ... scratch ... are not saved by
CCS during interrupt ...

09C @DIV88.P1
.......
0CF @ADDFF.@SCRATCH
0D0 @ADDFF.@SCRATCH
0D1 @ADDFF.@SCRATCH

If these scratches are used in both main() and interrupt,
then ... bingo !

so, I backup and restore all these scratches by myself,
it solved my problem.

however, I don't understand why strange things still happen
even when interrupt is turned off ?

I really don't want to change compiler ...

Best wishes</font>
___________________________
This message was ported from CCS's old forum
Original Post ID: 10168

Kurt Franke · Guest

:=<font face="Courier New" size=-1>All the following ... scratch ... are not saved by
:=CCS during interrupt ...
:=
:=09C @DIV88.P1
:=.......
:=0CF @ADDFF.@SCRATCH
:=0D0 @ADDFF.@SCRATCH
:=0D1 @ADDFF.@SCRATCH
:=
:=If these scratches are used in both main() and interrupt,
:=then ... bingo !
:=

Very interesting...

:=so, I backup and restore all these scratches by myself,
:=it solved my problem.
:=
:=however, I don't understand why strange things still happen
:=even when interrupt is turned off ?
:=

Yep, the interrupts are off and besides I don't use floats, division, or multiplication in my interrupts.

:=I really don't want to change compiler ...
:=

Nor do I but I'm very frustrated right now and getting further
behind schedule.
-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10171

Kurt Franke · Guest

Thank you Nilsener,

I'm glad to see I'm not totally alone.

-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10172

Kurt Franke · Guest

:=I thought you said that a simple test, transferring the binary values to a smaller block of code, gave the correct result?.

Yes. Making a new program that does nothing but add the two numbers and prints out the answer works.

:=I have a significant amount of arithmetic, subtracting odd values, multiplying with correction factors, and then applying a log, that all works fine. A sign error here would give ludicrous results (given the log...). This has now run in various versions for many weeks. An error of this sort would show immediately (high and low alarm outputs), especially since the ranges involved cover several decades of arithmetic, which imply the odds are that one system would have hit the problem somewhere.

I was already using a large amount of math before adding this functionality and there was a time when everything worked.
Also, see Nilsener's post.. someone has run into this before.

:=Given this it does seem to imply an interaction, but given you have now tried with the interrupts disabled, the question is from what?. Are you pulling the binary values (that you show), before using the printf, or afterwards?. Have you kept the interrupts disabled up to this point?. (I am wondering if printf is interfering with a value, since I don't use this, but transfer the value using SPI).

I think I can rule out printf interactions because I can store the result in a third variable and print out all three variables after the operation. Only the result is corrupted. In the latest test interrupts are disabled before any printf's.

-Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10173

Kurt Franke · Guest

I have determined the cause of this error. See my post on
"bug in pch ADDFF found"

Thank you,

Kurt
___________________________
This message was ported from CCS's old forum
Original Post ID: 10238