CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to support@ccsinfo.com

Indexing through large arrays; more efficient way?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
RoGuE_StreaK



Joined: 02 Feb 2010
Posts: 73

View user's profile Send private message

Indexing through large arrays; more efficient way?
PostPosted: Sun Feb 12, 2012 6:22 pm     Reply with quote

I'm playing back some sound effects stored in the PIC's internal memory; seems to play fine, but I'm wondering if there's a "better" way of doing this?

example array:
Code:
const char sndStartup[822][16]={
0x80,0x7F,0x7F,0x80,0x7F,0x7F,0x81,0x80,0x7E,0x7F,0x80,0x7F,0x7E,0x7D,0x7C,0x83
0x84,0x87,0x8A,0x82,0x7F,0x81,0x80,0x85,0x83,0x7E,0x82,0x81,0x7B,0x7D,0x7D,0x7F
...
};
As you can see, this sound has 822 rows and 16 columns (sorry can't remember the correct terminology). I'm currently just indexing through it like this:
Code:
sound = sndStartup[xCount][yCount];     
      yCount++;
      if(yCount > 15)
      {
         yCount = 0;
         xCount++;
         if(xCount > 822)
         {
            xCount = 0;
         }
      }
ie., count through the column ("yCount"), then when it reaches the last column, reset to 0, and increment the row ("xCount"). This is part of a function which runs depending on a flag set by an interrupt running at 16kHz, it sends whatever value is currently in "sound" to the PWM system, then looks up the next sample in time for the next flag.

PIC is currently a PIC18F2620 (64K memory built-in), moving towards a PIC24FJ64; so theres "plenty" of space for these arrays onboard, it just seems to me that there's probably a more efficient way of indexing through them.
Also note that the sounds don't always cleanly fill the array, so I have a few placeholder bits of data to fill up the ends, and have to have a counter which keeps track of the total samples played for that sound and compares it to a constant which states how many "actual" samples are in that particular array (eg. "sndStartup" has 13148 samples, while the 822x16 array has 13152 cells)

I've read around the place that looking up arrays using variables is a very slow way of doing things, but haven't found alternatives for large arrays such as this? It's working OK at the moment, but I still have to shoe-horn in quite a few more functions, so looking at optimising before it becomes an issue.
Ttelmah



Joined: 11 Mar 2010
Posts: 19447

View user's profile Send private message

PostPosted: Mon Feb 13, 2012 5:17 am     Reply with quote

This is where the classic crossover between an array, and a pointer, can be useful.
For instance (simplifying to a single dimensional array):
Code:

int8 n_array[x] = ........

int16 ctr;
for (ctr=0;ctr<x;ctr++) {
   val=n_array[ctr];
}

//Versus
int8 * n_ptr;
int8 * max;

n_ptr=n_array;
max=n_ptr+x;

for (;n_ptr<max;n_ptr++) {
   val=*n_ptr;
}


In the first, 'ctr' counts through the elements required in the array. At each access, the address of the array, is added to ctr, and this is then used as the index.
In the second, n_ptr, _is_ directly the required address. It counts from the first address in the array, up to the last address, and can be directly used as the memory address to access.

Obviously with a multi dimensional array, you may need to be incrementing by the size of a line for example, but the same basic approach can be used.

Best Wishes
FvM



Joined: 27 Aug 2008
Posts: 2337
Location: Germany

View user's profile Send private message

PostPosted: Mon Feb 13, 2012 7:13 am     Reply with quote

It should be noted, that flash array acesss is actually performed by table read instructions, which involve a considerable overhead. For time critical applications, care should be taken to organize it efficiently by reading blocks of a certain size at once rather than individual words.

PIC24 has an additional feature named PSV (program storage visibility) but it can only map up to 32 kB flash to data space.
RoGuE_StreaK



Joined: 02 Feb 2010
Posts: 73

View user's profile Send private message

PostPosted: Mon Feb 13, 2012 7:30 pm     Reply with quote

OK... been off reading up on pointers, trying to get some sort of grasp, hopefully I've made some headway.

First up though, should I actually be declaring my "snd_array" as "const int8" rather than "const char"? I had a whole heap of trouble originally getting the array into the code, the const char in the format shown was the only way I could get it to work, but I don't believe I tried const int.

From the sounds of it, using a single dimension array would be a lot easier to use with pointers, but again I had issues trying to get everything in as a one-dimensional array, so had to (?) split it up. But if anyone knows of a trick to get it to work as one-dimensional, it would greatly simplify my code, conversion processing, and keeping track of where the sound is up to.

With a two-dimensional array, I can't find any C examples that make sense (to me) to bring over to CCS; does it need pointers within pointers (as it's an array of arrays), or can one pointer be used to point to all of the dimensions, where
*(n_ptr+16)
gives the next internal array? (ie changing pointer value from flash_array[0][0] to flash_array[1][0]?)
I think I grasped the single-dimensional pointer bit OK, but going multi-dimensional is doing my head in!


Then again, with FvM's note, should I point to a single sub-array, and copy it's contents (16 cells) to a variable array for quicker access? Or is this a moot point if using pointers?

RE: PIC24's PSV, I just had a quick look into that, from how I read it is this an automatic thing, so moving to PIC24 would negate some of these changes? Although I'm using 64K chips, at the moment it's looking like my sound samples may well total less than 32K, so does this mean it would be a non-issue anyway?
Ttelmah



Joined: 11 Mar 2010
Posts: 19447

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 3:53 am     Reply with quote

The point about the pointer, versus the array, is maths.

If you have an array, with two indexes [a & b say], then when you access:

array[a][b], the compiler has to take 'a', multiply it by the size of the row, then add b, and only then perform the table lookup to read the element. Quite a lot of arithmetic.

Now if you have an array like this, and declare a pointer, then initialise this with:

ptr=array;

then *ptr is the same as array[0][0].

However if you not increment ptr, it will address array[0][1]. Keep on going, till you have incremented it 15 times, and it now addresses array[0][15]. Increment it again, and it now addresses array[1][0]. A two dimensional array is still just a linear block of data in memory, and incrementing with the pointer, removes the need to keep recalculating the product each time. Not a big saving on the chips with hardware multiply, but still a good handful of instructions.
So a single pointer can walk through the entire table from any location you want.

Separately, even doing this, the code will have to setup the table lookup for each element in turn. The alternative to this, is what FvM is talking about, which is to read a whole row, when required. So (for example):
Code:

   int8 ramrow[16];
   int16 rownum;
   int16 address;
   address=label_address(sndStartup);

   rownum=something; //set the row number you wish to retrieve

   read_program_memory((address+(rownum<<4)),ramrow,16);
   //ramrow, now hold the entire row at 'rownum', and can be accessed much faster
   //than the ROM.


If you organise your table, so that the row size is the entire block you want to work with at a time, this is a much more efficient approach.

On char, int, int8 etc., these are all synonyms for one another on CCS. I happen to prefer using the explicitly 'sized' versions (int8, int16 etc.), since I tend to be writing code for a number of different processors, where in some cases a 'char' is _not_ an int8, or an int.

Add one more thing, on needing to set the array up as two dimensional, this is basically an initialisation limit, with the compiler not being able to handle a single 'unbroken' initialisation of such a massive array. Even though you have to switch to declaring the array as multidimensional, it is still a single entity in memory.

Best Wishes
RF_Developer



Joined: 07 Feb 2011
Posts: 839

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 4:07 am     Reply with quote

Edit: This overlaps a lot with Ttelmah's post done at the same time.

There's a lot to think about here. First, consider a classic Von Neumann architecture, with one bus for both instructions and data and one one memory space and no caching, indeed just such a machine for which C was initially developed.

Many such machines include an auto-incremented addressing mode of some sort. These allow the processor to efficiently step through memory. A compiler for these machines may, depending on its code generation strategy, be able to leverage such instructions to produce optimised sequential accessing of arrays, assuming, and its a pretty big assumption, that it can recognise such accesses in the C code.

Even without such hardware assistance, sequential pointer access to the elements of an array are likely to be more efficient - faster and smaller code - than indexed access. The point is that indexed access, using C array indexing - requires a multiplication by the size of the element except when the element is the size as the granularity of the machine's addressing. So, indexing bytes on a byte addressable machine is simple, accessing 32 bit words requires multiplication of the index by 4. Multiplying by binary powers is normally simple and quick as they can be done by shifts, but is generally more complicated when the element size, such as with an array of structures, is not a power of two.

There is also a hidden overhead on many modern wider word machines. For example the ARM 7s which are 32 bit machines but are byte addressed, the familiar x86 architecture also. For such machines the fastest, simplest way to access memory is in words aligned to four byte boundaries. Its faster to access properly aligned 32 bit words than individual bytes, which have to be extracted from words, the rest of which may or may not be redundant.

Two and more dimensional arrays require an additional multiplication for each extra dimension. This can soon get expensive on time and code space, especially with machines that offer little or no hardware assistance for multiplications.

Pointers can provide useful efficiency gains, particularly speed of access compared to indexing provided the accesses are sequential, i.e. stepping through arrays. If you require random access then pointers are generally pointless. C doesn't technically have multidimensional arrays, all are just one dimension, instead it has arrays or arrays, and each "dimension" has its own indexing. Even then you must be aware of the order in which dimensions of the array are stored. For C and C++ the rightmost dimension has the fastest changing address, so stepping through the array is much more efficient one way than any other.

All this falls apart with C# however. In C# there are no pointers as each element is a full blown object with all the overhead that entails, which is considerable. So with C# there are other ways to iterate through arrays, and many other forms of collections of data than the simple array.

Back to C. All the above holds true with most processors, but what if the processor is NOT a Von Neumann type? What if its a Harvard architecture processor, like the PIC?

PIC 16s and 18s have data memory that is both paged and separate from code memory. This allows a compact instruction set and simple pipelining as instruction and data access can be done in parallel over their separate buses and memories. The down side is that access to data memory is limited to the page size, 256 bytes, after that the paging needs to be changed to access another 256 byte data memory page. Pointers are not simply addresses, they are page, offset pairs and are more complex to manage than simple linear addresses. All this is taken care of for us by the compiler, but its generally a matter of luck whether even small arrays are located in one page, and it can vary from one compilation to the next.

Pointers to data memory still work more efficiently than indexing however, especially as the 16s and 18s only have a 8 x 8 multiplier. This means that indexing can is at it is most efficient when the array is small, less than 256 bytes, or element size is a simple power or two.

Constants are generally stored in program memory by CCS. This makes for a lot of extra work for the processor as it has to use the time consuming and unpipelinable table read/writes to access the program memory. The difficulties of index to address conversion still apply. It may well be more efficient overall to cache blocks of such data in data memory - reading a block of say 256 bytes of data in to data memory then accessing that by pointers - than to pull each and every value off one by one. Generally I suspect many sound generators will want data in blocks anyway, so grabbing a block and sending a block as one will often make more sense.

Some hardware assists can help here: DMA type transfers (rare on PICs however due to simplicity of the internal busses), interrupt driven SPI and I2C transfers.

PIC 24s should be a much better bet for this sort of thing. They have wider multipliers that make indexing simpler. They don't have paged data memory, simplifying data memory accessing. They have PSV (Program Space Visibility) which maps a decent chunk of program memory into data memory address space, making reading of blocks of constant data relatively simple. If the compiler can leverage all this then it can make a much better job of generating decent code. I confess I haven't worked with any 24s so I can't test any of this.

Optimisation is all about knowing these limitations and working within them, using the hardware to its best advantage. What's best on one processor might actually be worse on another. Even in Intel x86 processors optimisation, such as in Intel's own optimised libraries, is done on a processor by processor basis, taking into account all the peculiarities of architecture.

RF Developer

PS: for 18s char, int and int8 are all pretty much the same thing and should be treated the same by the compiler.
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 3:07 pm     Reply with quote

RoGuE_StreaK wrote:
First up though, should I actually be declaring my "snd_array" as "const int8" rather than "const char"? I had a whole heap of trouble originally getting the array into the code, the const char in the format shown was the only way I could get it to work, but I don't believe I tried const int.
Int8 and char are the same in the PIC18 CCS compiler so that doesn't matter.

What I don't understand is why you didn't succeed in creating a large single dimensional array like:
Code:
const char sndStartup[13152]={
0x80,0x7F,0x7F,0x80,0x7F,0x7F,0x81,0x80,0x7E,0x7F,0x80,0x7F,0x7E,0x7D,0x7C,0x83
0x84,0x87,0x8A,0x82,0x7F,0x81,0x80,0x85,0x83,0x7E,0x82,0x81,0x7B,0x7D,0x7D,0x7F
};
For me this compiles fine.
What is your compiler version number? It is a number like x.yyy at the top of the program list file (*.lst).
What is the error code you got?
Ttelmah



Joined: 11 Mar 2010
Posts: 19447

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 3:58 pm     Reply with quote

I think you will find it fails if you try to initialise all the entries. There is a compiler limit that seems to be hit at several thousand characters in a single unbroken initialisation. Hence where you actually hit is varies with the data format used for the entries, but several people have hit this. A couple of thousand entries seems reliable, but much beyond, can cause problems.

Best Wishes
RoGuE_StreaK



Joined: 02 Feb 2010
Posts: 73

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 6:29 pm     Reply with quote

Ttelmah wrote:
array[a][b]
ptr=array;
then *ptr is the same as array[0][0].
incremented it 15 times, and it now addresses array[0][15]. Increment it again, and it now addresses array[1][0].
A two dimensional array is still just a linear block of data in memory
Great, I thought that maybe it might operate that way if it essentially was linear in memory, but wasn't sure what would happen if when you incremented past 15.
So at the very least, I could use a pointer to the array, and simply increment it all the way up to the last sample required; quicker access, and strips away the sub-routines of incrementing through both the columns and rows.

Ttelmah wrote:
Separately, even doing this, the code will have to setup the table lookup for each element in turn. The alternative to this, is what FvM is talking about, which is to read a whole row, when required. So (for example):
Code:

   int8 ramrow[16];
   int16 rownum;
   int16 address;
   address=label_address(sndStartup);

   rownum=something; //set the row number you wish to retrieve

   read_program_memory((address+(rownum<<4)),ramrow,16);
   //ramrow, now hold the entire row at 'rownum', and can be accessed much faster
   //than the ROM.
If you organise your table, so that the row size is the entire block you want to work with at a time, this is a much more efficient approach.
Eek, need to research label_address and read_program_memory, but I think I get the guist. So this is a lot quicker again, compared to the pointer/indexing method? Will probably take me a while to contemplate the logistical implications on my routines.
Either way, it's good to pick up these alternate methods of doing things.

Ttelmah wrote:
Add one more thing, on needing to set the array up as two dimensional, this is basically an initialisation limit, with the compiler not being able to handle a single 'unbroken' initialisation of such a massive array.
OK, not an issue now that I know I could just page through the entire thing with one index. And my conversion method (wav to hex method) is reasonably conducive to making this two-dimensional array.

RF_Developer wrote:
Pointers can provide useful efficiency gains, particularly speed of access compared to indexing provided the accesses are sequential, i.e. stepping through arrays. If you require random access then pointers are generally pointless.
I'll always be stepping through sequentially. Some sounds will be a play-once, others will be looping, meaning an index through the array, then reset back to zero and start again. Sound will be also going straight from the end of one into the start of another, eg. a sound will loop, then when something happens externally a flag will be set so at the end of the current sound array it switches cleanly to the start of another array, and starts looping it instead.
eg two sound arrays, "woooh" and "waah"
loop and change on flag: "woohwoohwoohwoohwoohwoohw[flag here]oohwaahwaahwaahwaahwaah"
Is decrementing a pointer OK? I use a couple of the arrays in reverse to give a reversing of the sound, instead of making a whole new sound array.

RF_Developer wrote:
It may well be more efficient overall to cache blocks of such data in data memory - reading a block of say 256 bytes of data in to data memory then accessing that by pointers - than to pull each and every value off one by one. Generally I suspect many sound generators will want data in blocks anyway, so grabbing a block and sending a block as one will often make more sense.
I forgot to mention that this is all being performed on the PIC, the values from the sound arrays are being applied directly to a PWM duty cycle. From what I can tell from physical tests and MPLAB Sim, it's fetching and populating fine, but the sound routine takes up most of the time between interrupts, so I'm trying to decrease this time to give me more breathing room for other routines.
That said, just moving to the 24F, with the same crystal, theoretically doubles the speed, but any increase in efficiency can only be A Good Thing™


I'm not sure what broke with my orginal arrays, I just eventually found that the current method and structure worked, so stuck with it.
John P



Joined: 17 Sep 2003
Posts: 331

View user's profile Send private message

PostPosted: Tue Feb 14, 2012 8:45 pm     Reply with quote

When I needed to read large amounts of data from program ROM (on a PIC16F877A) I used a method that's close to cheating, if there is such a thing. First I compiled the program and noted how large the HEX file was. Then I made up a new HEX file offline, containing the data, and I made sure it started at an address slightly higher than the last program address. I went back to the C code, and plugged in the address that my data resided at, compiled it again, and then I hand-edited the compiler's HEX file to add my file after the compiler's output (this wouldn't have been necessary if my programming system didn't do an erase before loading the HEX file). Finally I programmed the chip.

Then when I wanted to read the data, I would first load the EEADR and EEADRH registers with the base of my data area plus any offset that was needed. Then I'd use the Microchip procedure to get the data from program ROM. I didn't quite trust CCS to do this the way I wanted, so:
Code:


  eeadrh = rom_addr_low;
  eeadr = rom_addr_high;
  bit_set(eecon1, 7);   // Not needed if you don't have to switch from EEPROM
  bit_set(eecon1, 0);      // Read
  #ASM
    NOP                         // Must insert 2 NOPs here
    NOP
  #ENDASM
  data_from_rom = (eedath << 8) + eedat;    // 14 bit quantity


Of course if you just want to grab the next word from ROM, you can just increment EEADR and EEADRH without loading a new value, and skip the first 2 lines above:
Code:

  if (++eeadr == 0)
    eeadrh++;


Note that if you want to fool around with HEX files, you have to allow for the fact that 2 bytes in the HEX file correspond to 1 word in the ROM, so addresses are doubled.
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Sat Feb 18, 2012 7:16 am     Reply with quote

The solution provided by Ttelmah using the read_program_memory() function is already a lot faster than the original code but still has the disadvantage that data is to be read in multiples of 16 bytes while the original sound data has variable length.

The function read_program_memory() is a wrapper around the hardware registers for reading from program memory, TBLPTR and TABLAT, and reads a datablock of the specified length. The function read_program_eeprom() is similar but reads a fixed length of 1 program word (2 bytes). This is more flexible but returns a word where you want byte access.

Looking at the disassembly code for these functions you see there is a very effective assembly instruction being used which reads one Flash memory byte and advances the pointer to the next address (TBLRD*+), all in just two clock cycles. If only you had access to this command from C-code... Luckily you can.

Here is a demonstration program which:
- accepts a starting memory address
- accepts a random data length
- outputs the read data directly to your sound output function.

Because now all the work is done in one large loop there is less overhead for initializing the registers again and again. And you can specify a random data length instead of a 16 byte multiple.

The loop for writing 1 byte to the output in this program takes just 14 instruction cycles. 1 instruction less when you use fast_io.

Code:
#include <18F458.h>
#FUSES HS, PROTECT,NOWDT, NOBROWNOUT,NOLVP
#use delay(clock=4MHz)

#byte TBLPTRU = GETENV("SFR:TBLPTRU")
#byte TBLPTRH = GETENV("SFR:TBLPTRH")
#byte TBLPTRL = GETENV("SFR:TBLPTRL")
#byte TABLAT  = GETENV("SFR:TABLAT")


const char sndStartup[13152]={
0x80,0x7F,0x7F,0x80,0x7F,0x7F,0x81,0x80,0x7E,0x7F,0x80,0x7F,0x7E,0x7D,0x7C,0x83
0x84,0x87,0x8A,0x82,0x7F,0x81,0x80,0x85,0x83,0x7E,0x82,0x81,0x7B,0x7D,0x7D,0x7F
};

void OutputSnd(int32 Addr, int16 Length)
{
   // Set the Program Memory read start address
   TBLPTRU = make8(Addr, 2);
   TBLPTRH = make8(Addr, 1);
   TBLPTRL = make8(Addr, 0);

   // Read and Output all sound data
   while (Length > 0)
   {
      // Read 1 byte from Program Memory and advance pointer to next byte
      // Sound data is available for reading in register TABLAT.
      #asm TBLRD*+ #endasm;

      // Output sound data (change to whatever other output function you need).
      output_b(TABLAT);

      Length--;
   }
}

void main()
{
   int32 SoundAddr;
   int16 SoundLen;
   
   SoundAddr = label_address(sndStartup);
   SoundLen = sizeof(sndStartup);
   OutputSnd(SoundAddr, SoundLen);

   for(;;);
}
bkamen



Joined: 07 Jan 2004
Posts: 1611
Location: Central Illinois, USA

View user's profile Send private message

PostPosted: Sat Feb 18, 2012 12:21 pm     Reply with quote

RF_Developer wrote:

Optimisation is all about knowing these limitations and working within them, using the hardware to its best advantage. What's best on one processor might actually be worse on another. Even in Intel x86 processors optimisation, such as in Intel's own optimised libraries, is done on a processor by processor basis, taking into account all the peculiarities of architecture.


Amd if I may add that the PIC18F's (not sure on the PIC16's. Haven't used them in SO long) do have some additional mechanisms to speed memory moves across pages via the indirect addressing registers FSR0-2 and their associated fun result registers.

Section 6.4.3.1 for the 18F97J60:
wrote:
Because Indirect Addressing uses a full 12-bit address,
data RAM banking is not necessary. Thus, the current
contents of the BSR and the Access RAM bit have no
effect on determining the target address.


Section 6.4.3.2 for the 18F97J60:
wrote:
In addition to the INDF operand, each FSR register pair
also has four additional indirect operands. Like INDF,
these are “virtual” registers that cannot be indirectly
read or written to. Accessing these registers actually
accesses the associated FSR register pair, but also
performs a specific action on its stored value. They are:
• POSTDEC: accesses the FSR value, then
automatically decrements it by ‘1’ thereafter
• POSTINC: accesses the FSR value, then
automatically increments it by ‘1’ thereafter
• PREINC: increments the FSR value by ‘1’, then
uses it in the operation
• PLUSW: adds the signed value of the W register
(range of -128 to 127) to that of the FSR and uses
the new value in the operation


So for large indexed arrays, the PIC18's through the use of the hardware 8x8 multiplier plus the use of the FSR's can make accessing arrays larger than 1 page (256bytes) pretty efficient.

(unless I'm totally reading the datasheet wrong)

Cheers,

-Ben
_________________
Dazed and confused? I don't think so. Just "plain lost" will do. :D
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Sat Feb 18, 2012 4:31 pm     Reply with quote

bkamen wrote:
So for large indexed arrays, the PIC18's through the use of the hardware 8x8 multiplier plus the use of the FSR's can make accessing arrays larger than 1 page (256bytes) pretty efficient.
True, but the FSR registers are for RAM addressing only. The large constant data array discussed in this thread is located in Program Memory, i.e. ROM.
For ROM addressing you use the similar TBLRD register as in my posted example code.
bkamen



Joined: 07 Jan 2004
Posts: 1611
Location: Central Illinois, USA

View user's profile Send private message

PostPosted: Sat Feb 18, 2012 6:06 pm     Reply with quote

ckielstra wrote:
bkamen wrote:
So for large indexed arrays, the PIC18's through the use of the hardware 8x8 multiplier plus the use of the FSR's can make accessing arrays larger than 1 page (256bytes) pretty efficient.
True, but the FSR registers are for RAM addressing only. The large constant data array discussed in this thread is located in Program Memory, i.e. ROM.
For ROM addressing you use the similar TBLRD register as in my posted example code.


I know -- I just wanted to mention it as a supplement the other ideas about optimizations in this thread.

-Ben
_________________
Dazed and confused? I don't think so. Just "plain lost" will do. :D
Ttelmah



Joined: 11 Mar 2010
Posts: 19447

View user's profile Send private message

PostPosted: Sun Feb 19, 2012 9:13 am     Reply with quote

Just one little 'comment' that may help in thinking out this type of storage.

There are two fundamentally 'different' ways of declaring a block of data in the program memory. The first (used here so far), is to just declare a variable as 'const'. This then builds a table containing the data, with at it's 'head' the code to retrieve the data. Plus side - you don't have to worry about allocating space for it - the compiler does this for you, relocating it if needed etc. etc.. Down side, you don't know 'where' the actual data is!. This is where 'label_address' comes in, telling you where the actual data 'table' associated with the variable is placed. You can also read the data if required just as if it is in RAM.
Second method, is the #ROM declaration. This _just_ puts a table containing the defined data, without any extra code. Plus is that you know exactly where it is!. Down side is that you then have to access the elements yourself, and have to work out the locations to put it. You can for example have ten successive 1K #ROM statements, declaring a 10KB block of data, with no overhead, and filling the whole of a 10KB block of your chip's ROM (assuming the chip has this much ROM...).
If one if going 'DIY' on the access code for speed, then it is probably worth switching to using a #ROM declaration for the data.

Worth also realising that you can 'encapsulate' your fetching code. So you just call a routine with an address, and it retrieves an entire block of X bytes around the specified location, then if you access something it already has fetched, just gets the byte from RAM, rather than reading the program memory again. This is how disk accesses are done, with you not having to worry that the block size is (say) 512 bytes, just being able to fetch the byte at location 12345 on a file, and the code automatically reads a sector, and returns the required byte. If you then ask for the next byte, it returns this from the buffered version in RAM.....


Best Wishes
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group