|
|
View previous topic :: View next topic |
Author |
Message |
Wi1l
Joined: 03 Oct 2020 Posts: 3
|
IEEE 754 half-precision binary floating point |
Posted: Sat Oct 03, 2020 8:10 pm |
|
|
Hi, someone can tell me how to convert a base 10 decimal number to 16 bit half-precision IEEE 754 binary floating point ? I have tried to use the IEEEFloat driver but this only works for 32 bits. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Sun Oct 04, 2020 12:32 am |
|
|
Ouch. binary16. Probably the easiest way is going to be to extract the
parts from the standard float32 format.
Question. What PIC family are you using?. (PIC16/18 or a PIC24/30/33).
Do you need to handle conversions both ways?.
binary16, has a 10bit fractional part, and a 5bit exponent with sign.
So, provided the number is inside the range this supports, you can
just extract the components from a standard 32bit float:
Code: |
typedef unsigned int16 float16;
union {
float32 fp;
unsigned int8 bytes[4];
unsigned int16 words[2]
unsigned int32 longword;
} combiner;
float16 f32tof16(union combiner val)
{
int1 sign;
unsigned int32 fraction;
unsigned int32 exponent;
float16 result;
//OK, we arrive with a standard float32 value. If we are using PCD
//this is in IEEE format, otherwise Microchip. Using the union allows
//access to the bytes in this
#if defined(__PCD__)
//Here code for PCD compiler
//Now need to extract the parts from the value.
exponent = val.longword & 0x7F800000ul; //8bits
sign=bit_test(val.longword,31); //top bit is sign
#else
//Here we are on a PIC 16/18, so data is in different positions
exponent = (val.longword & 0xFF000000ul)>>1; //8 bit exponent
sign=bit_test(val.longword,23); //bit 23 is sign
#endif
fraction = val.longword & 0x007FFFFFul; //low 23 bits
//Now really should test for zero and infinity here
//However not doing this. Assume number is already small enough
//to fit.
//So the result needs the low five bits of the exponent, with the high
//10 bits of the fraction, and the sign.
result=(fraction>>13) & 0x3FF; //high ten bits
result+=(exponent>>13) & 0x7C00; //extract exponent five bits
if (sign)
bit_set(result,15); //set sign bit
//So now should have the required 16bits.
return result;
}
|
Completely untested, just 'created in my mind', but this should give
the binary16 equivalent of a float32 value.
If you need to go the other way, you'd have to do the opposite, and
rebuilt the float32 value from the parts of the float16.
To do it more correctly, you would probably need to perform some
form of rounding on the conversion. This just clips. |
|
|
Wi1l
Joined: 03 Oct 2020 Posts: 3
|
|
Posted: Sun Oct 04, 2020 10:28 am |
|
|
Quote: | What PIC family are you using?. (PIC16/18 or a PIC24/30/33).
Do you need to handle conversions both ways?. |
Hi Ttelmah, I am using PIC18 and need to handle conversions both ways. I need to do this because the IEEE data comes from an RTU and I need to do mathematical operations with this data. For this I need to convert the IEEE to float, do the operation and convert it back to IEEE.
Here is my test code. I can't convert the test number (11.75 in IEEE 754 half-precision representation) to float.
Code: | #include <18F4520.h>
#fuses HS,WDT32768,PROTECT,NOLVP,NOBROWNOUT
#use delay(clock=20MHz)
#use rs232(baud=9600, xmit=PIN_C6, rcv=PIN_C7)
#include <ieeefloat.c>
#include <math.h>
int16 dataIEEE;
float resultFloat;
int16 resultIEEE;
void main(){
dataIEEE = 0x49E0; // 11.75 In IEEE 754 half-precision representation
resultFloat = f_IEEEtoPIC(dataIEEE);
printf("My Float Numbre = %8.4f\r\n",resultFloat);
delay_ms(100);
printf("My IEEE Numbre = %LX\r\n",f_PICtoIEEE(resultFloat));
delay_ms(100);
while(TRUE);
}
|
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Sun Oct 04, 2020 11:25 am |
|
|
No, you won't. You need to write the code yourself.
CCS does not support binary16 format (few people do, the accuracy
is so low, only 3.5digits). The ieeefloat library is to convert numbers
to and from the MicroChip float format to/from the IEEE 32bit format.
Somebody may well have posted code for this. A search should find it.
I doubt if I've handled the exponent correctly. I think you have to
add 128 and tnen perform the rotations, and then subtract 16. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Mon Oct 05, 2020 3:55 am |
|
|
OK. I've sat down, and written the code to do this. It uses the ieefloat
driver to do the final conversion for the PIC18/16. I've tested on a PIC24
and on this is works. Haven't tested on the PIC16/18.
Code: |
//Routines to convert a float32 to a float16, and vice versa.
//First define the types needed
#include "stdint.h"
typedef uint16_t float16_t; //Use an int16 to hold the new float
union combiner {
float32 fp;
uint8_t bytes[4];
uint16_t words[2];
uint32_t longword;
}; //To allow the parts to be accessed
//Now what to be able to define constants as 'U32' on any processor
#if defined(__PCD__)
#define U32(x) x##ul
#else
#define U32(x) x##ull
#endif
//Now the limit values for the new type
// Smallest positive short float
#define SFLT_MIN 5.96046448e-08
// Smallest positive normalized short float
#define SFLT_NRM_MIN 6.10351562e-05
// Largest positive short float
#define SFLT_MAX 65504.0
// Smallest positive e
// for which (1.0 + e) != (1.0)
#define SFLT_EPSILON 0.00097656
// Number of digits in mantissa
// (significand + hidden leading 1) so actually 10 stored
#define SFLT_MANT_DIG 11
//Number of actual stored bits
#define SFLT_SIGBITS 10
// Number of base 10 digits that
// can be represented without change
#define SFLT_DIG 2
// Base of the exponent
#define SFLT_RADIX 2
// Minimum negative integer such that
// HALF_RADIX raised to the power of
// one less than that integer is a
// normalized short float
#define SFLT_MIN_EXP -13
// Maximum positive integer such that
// HALF_RADIX raised to the power of
// one less than that integer is a
// normalized short float
#define SFLT_MAX_EXP 16
// Minimum positive integer such
// that 10 raised to that power is
// a normalized short float
#define SFLT_MIN_10_EXP -4
// Maximum positive integer such
// that 10 raised to that power is
// a normalized short float
#define SFLT_MAX_10_EXP 4
//Now the value for infinity in this format
#define SFLT_INF U32(0x7C00)
//Function to convert a float32 to float16.
float16_t float32to16(float v)
{
uint32_t sign;
uint32_t mantissa, half_mantissa;
uint32_t exponent;
uint32_t round_bit;
signed int32 unbiased;
uint32_t temp32;
union combiner val;
#if defined(__PCD__)
val.fp=v
#else
//Here code for PCB/PCM/PCH ompiler
//Need to convert to IEEE format
val.longword=f_PICtoIEEE(v);
//Here we are on a PIC 16/18, so data is in different positions
//Originally extracting directly, but now using the IEEE conversion
//routines. Found otherwise there is an issue with the exponent
//not being normalised on the MicroChip format... :(
#endif
//Now need to extract the parts from the value.
exponent = val.longword & U32(0x7F800000); //8bits
sign=val.longword & U32(0x80000000); //top bit is sign
mantissa=val.longword & U32(0x7FFFFF);
//Now test for infinity
if (exponent==U32(0x7F800000))
{
//Maximum exponent in source.
//two possibilities. If mantissa is zero, then still zero, otherwise INF
if (mantissa==0)
temp32=0;
else
temp32=U32(0x200);
//Now have to build the 16bit value with sign, matissa & exponent
return ((sign>>16) | SFLT_INF | temp32 | (mantissa>>13));
}
//So now have to build the half precision value
sign>>=16; //Move the sign down to bit 15
//Now need to convert exponent
unbiased = (((signed int32)(exponent>>23))-127);
//Now add 15 to generate the exponent for the float16
unbiased +=15;
//at this point the unbiased ewponent supports 0 to 0x1F
if (unbiased>=0x1F)
{
//here exponent is too large, so return +/- infinity
return (sign | SFLT_INF);
}
if (unbiased <=0)
{
//Now check for underflow
if ((14-unbiased)>24)
{
//full underflow
return (sign); //gives a 'signed zero'
}
//Now need to add in the missing mantissa bit
mantissa |= U32(0x800000);
half_mantissa=mantissa>>(14-unbiased);
//Now test for rounding
round_bit=U32(1)<<(13-unbiased);
if ((mantissa & round_bit) != 0 && (mantissa & (3 * round_bit - 1)) != 0)
half_mantissa++;
//No exponent for this
return (sign | half_mantissa);
}
//Now move the exponent to final location - need this to be done as unsigned
unbiased = (unsigned int32)(unbiased)<<10;
half_mantissa=mantissa>>13;
//Now test for rounding
round_bit=U32(0x1000);
if ((mantissa & round_bit) != 0 && (mantissa & (3 * round_bit - 1)) != 0)
{
// Round it
return ((sign | unbiased | half_mantissa) + 1);
}
else
{
return (sign | unbiased | half_mantissa);
}
}
//Now the reverse of the above. Handed a float16, generates a float32
float32 float16to32(float16_t val)
{
//This is actually quite a bit simpler, since there is no rounding or limits
//anything that can be held in a float16, can be represented by a float32.
//First test if we have been given a zero.
if ((val & 0x7FFF)==0)
{
if (bit_test(val,15)) //return 0.0 with the same sign
return (-0.0);
else
return (0.0);
}
union combiner result;
uint32_t half_sign;
signed int32 half_exp;
signed int32 exponent;
signed int32 leading;
uint8_t digit;
uint32_t half_mantissa,mantissa;
half_sign=val & 0x8000u;
half_exp=val & 0x7C00u;
half_mantissa=val & 0x3FF;
//Now test if we have an infinity
if (half_exp==SFLT_INF)
{
if (half_mantissa==0)
{
//put the sign bit in
result.longword=((half_sign<<16) | U32(0x7F800000));
return result.fp;
}
//If there is a mantissa return this as well, but with MSb set
result.longword=((half_sign<<16) | U32(0x7FC00000) | (half_mantissa<<13));
return result.fp;
}
//Now rebuild float32 components
half_sign<<=16;
exponent=(half_exp>>10)-15; //can have -ve values here
if (half_exp==0)
{
//here potentially need to adjust mantissa and exponent
//Depends on how many leading zeros the mantissa has...
//Need to count down from bit 10
digit=10;
leading=0;
while (bit_test(half_mantissa,digit)==0)
{
leading++;
if (digit==0)
break; //abort if finished
digit--;
}
//arrive here with number of leading zeros in the mantissa
//First the exponent - because I have saved this as signed, can handle -ve results
//However should be well impossible given the number of digits supported....
//Howeever need int32, to allow room for the rotation.
exponent=(112-leading)<<23; //127 (float32) - 15 (floast16) for exponent = 112
mantissa=((half_mantissa & U32(0x3FF))<<13);
result.longword=(half_sign | exponent | mantissa);
return result.fp;
}
//Now the final part a value that doesn't require adjustent.
exponent=(exponent + 127)<<23;
mantissa=half_mantissa<<13;
result.longword=(half_sign | exponent | mantissa);
#if defined(__PCD__)
//Here code for PCD compiler
return result.fp; //IEEE format already
#else
//Here we are on a PIC 16/18, so data is in different positions
//need to reformat
return f_IEEEtoPIC(result.longword);
#endif
}
//Now basic test code.
#include <24FJ128GA006.h>
#device ICSP=1
#use delay(crystal=20000000)
#FUSES NOWDT //No Watch Dog Timer
#FUSES CKSFSM //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)
#include <ieeefloat.c>
#include "binary16.h"
void main()
{
//Now a couple of tests
float16_t test;
float32 fpval=100.0;
test=float32to16(fpval);
//Now does this look right?.
printf("%04x test\r",test);
//now convert back
fpval=float16to32(test);
printf("%4.1f fp\r", fpval);
while(TRUE)
{
}
}
|
Examples:
Code: |
//PIC24
#include <24FJ128GA006.h>
#device ICSP=1
#use delay(crystal=20000000)
#FUSES NOWDT //No Watch Dog Timer
#FUSES CKSFSM //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)
#include <ieeefloat.c>
#include "binary16.h"
void main()
{
//Now a couple of tests
float16_t test;
float32 fpval;
fpval=100.0;
test=float32to16(fpval);
//Now does this look right?.
printf("%04x test\r",test);
//now convert back
fpval=float16to32(test);
printf("%4.1f fp\r", fpval);
fpval=float16to32(0x49E0);
printf("%4.1f fp\r", fpval);
while(TRUE)
{
}
}
|
Code: |
//PIC8
#include <18F4520.h>
#device ICSP=1
#use delay(crystal=20000000)
#FUSES NOWDT //No Watch Dog Timer
//#FUSES CKSFSM //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)
#include <ieeefloat.c>
#include "binary16.h"
void main()
{
//Now a couple of tests
float16_t test;
float32 fpval;
fpval=100.0;
test=float32to16(fpval);
//Now does this look right?.
printf("%04x test\r",test);
//now convert back
fpval=float16to32(test);
printf("%4.1f fp\r", fpval);
fpval=float16to32(0x49E0);
printf("%4.1f fp\r", fpval);
while(TRUE)
{
}
}
|
Have modified this. Found there is a problem as originally posted on the
PIC16/18. On this format, the compiler uses a exponent of zero for
certain values. This causes issues. So have rewritten, and added a PIC18
example.
Last edited by Ttelmah on Tue Oct 06, 2020 2:07 am; edited 2 times in total |
|
|
Wi1l
Joined: 03 Oct 2020 Posts: 3
|
|
Posted: Mon Oct 05, 2020 12:42 pm |
|
|
Hi Ttelmah, when I try to compile it for Pic18 using ccs V4.074, I get "Unknown type" error in this part of code:
Code: | typedef uint16_t float16_t; //Use an int16 to hold the new float
union combiner {
float32 fp;
uint8_t bytes[4];
uint16_t words[2];
uint32_t longword;
}; //To allow the parts to be accessed |
Which could be the cause? |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Tue Oct 06, 2020 12:49 am |
|
|
You shouldn't, provided you have this line first:
#include "stdint.h"
This gives the definitions for uint8_t etc..
I just changed the processor include to 18F4520.h, and removed the
#fuses CKFSM, and the code compiled as posted.
I've edited the original file, with another change I found necessary for the
PIC18, and have posted a PIC18 example as well. Both work as posted. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|