View previous topic :: View next topic |
Author |
Message |
andyd
Joined: 08 Mar 2007 Posts: 30
|
Execution times & speeding up programs |
Posted: Sun Apr 08, 2007 10:57 am |
|
|
I've posted a few times about something I'm trying to do, but I'm still having problems with it. I'm trying to make a PIC16F88 decode ADPCM audio and output it via the PWM into a LPF and then an amp & speaker, but having some serious issues getting it to produce anything intelligble, which I think is down to how long it's taking to complete the process.
The sample rate of my audio is 8kHz, so in 1 second the PIC will need to decode and output 8000 samples, meaning 1 sample every 125us. Right?
Well when I set the decode/output loop to run 8000 times (I have a 1 second long sample), it takes about 4 or 5 seconds.
Is there any way of finding out the number of instruction cycles the routine takes up and then relating this to clock speed? I'm trying to run the PIC on its 8MHz internal oscillator to reduce the number of external components needed (as well as power etc.). I've got a feeling that what's taking up the time is the EEPROM read, and ideally I'd like to have the PIC doing this while it's doing other parts of the routine, but I'm having enough trouble getting my head round single threaded apps!
Code is below, any suggestions are welcome:
Decode routine call (wait is triggered by an interrupt on timer2):
Code: |
prevsample = 0; // Clear ADPCM previous sample
previndex = 0; // Clear ADPCM previous index
index = 0;
diffq = 0;
step = StepSizeTable[index];
address_upper = 0b00000000;
address_lower = 0b00000001;
fputs("Press button to start playback.", PC);
while(input(PIN_B3)) // Wait for button press
{
}
for(i=0;i<8000>>4) & 0x0f);
Write10bitPWM(sample);
while(wait)
{
}
wait = 1;
sample = ADPCMDecoder(code & 0x0f);
Write10bitPWM(sample);
while(wait)
{
}
}
|
ADPCM decode routine:
Code: |
signed long ADPCMDecoder(char code) // ADPCM decoding routine
{
/* Restore previous values of predicted sample and quantizer step
size index
*/
predsample = prevsample;
index = previndex;
/* Find quantizer step size from lookup table using index
*/
step = StepSizeTable[index];
/* Inverse quantize the ADPCM code into a difference using the
quantizer step size
*/
diffq = step >> 3;
if(code & 4) diffq += step;
if(code & 2) diffq += step>>1;
if(code & 1) diffq += step>>2;
/* Add the difference to the predicted sample
*/
if( code & 8 ) predsample -= diffq;
else predsample += diffq;
/* Check for overflow of the new predicted sample */
if(predsample > 32767)
predsample = 32767;
else if(predsample < -32768)
predsample = -32768;
/* Find new quantizer step size by adding the old index and a
table lookup using the ADPCM code
*/
index += IndexTable[code ];
/* Check for overflow of the new quantizer step size index
*/
if( index <0> 88 ) index = 88;
/* Save predicted sample and quantizer step size index for next
iteration
*/
prevsample = predsample;
previndex = index;
/* Return the new speech sample */
return(predsample);
}
|
EEPROM read:
Code: |
unsigned char eeprom_read(unsigned char address_upper, unsigned char address_lower)
{
unsigned char temp_byte;
i2c_start(); // Start communication
i2c_write(0xA0); // Send control code & address of EEPROM then set to write mode
i2c_write(address_upper); // Write upper address bits
i2c_write(address_lower); // Write lower address bits
i2c_start(); // Start communication
i2c_write(0xA1); // Send control code & address of EEPROM then set to read mode
temp_byte = i2c_read(0); // Read data without acknowledge bit
i2c_stop(); // Stop communication
return temp_byte;
}
|
PWM output:
Code: |
void Write10bitPWM(signed long sample)
{
unsigned long pwmout = 0;
pwmout = 0x8000 + sample; // Offset around 0x8000
pwmout = pwmout >> 7; // Scale to 9 bit by shifting right 7 bits
bit_clear(CCP1CON.5);
if(pwmout & 0b000000010) bit_set(CCP1CON.5); // Set second most LSB
bit_clear(CCP1CON.4);
if(pwmout & 0b000000001) bit_set(CCP1CON.4); // Set most LSB
pwmout = pwmout >> 2; // Scale to 7 bit by shifting right 2 bits
CCPR1L = pwmout; // Write resulting 7 bits to CCPR1L
}
|
Interrupt for "wait":
Code: | #INT_GLOBAL
void timer_isr()
{
#asm
//Store current state of processor
MOVWF save_w
SWAPF status,W
BCF status,5
BCF status,6
MOVWF save_status
// Nothing else changes in your interrupt
#endasm
wait = 0;
clear_interrupt(INT_TIMER2);
#asm
// restore processor and return from interrupt
SWAPF save_status,W
MOVWF status
SWAPF save_w,F
SWAPF save_w,W
#endasm
} |
|
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Sun Apr 08, 2007 5:09 pm |
|
|
Some more hints:
Code: | for(i=0;i<8000>>4) & 0x0f); | When posting code please select the 'Disable HTML in this post' option. Now parts of your code are missing making for unreadable code. Best is to disable this option as a default in your personal profile.
2) As you already suspected the EEPROM read routine is a problem. You are sending 4 bytes and receiving 1 byte. Including all control bits this adds up to a total of 48 bits to be transmitted. You didn't say which EEPROM you are using neither what speed the I2C is clocked, but assuming a clock of 400kHz this means you can do a maximum of 5,128 readings per second (excluding all timing overhead).
Is the sound you are trying to produce stored in the EEPROM? If yes, than you are reading the EEPROM with sequential addresses and is the suggestion from PCMprogrammer a real performance boost.
With consequetive reads the data transmitted over I2C is than reduced to 9 bits per sample, a theoretical maximum of 44,444 readings per second.
3) Another (relative small) optimization can be achieved by getting rid of the overhead of the interrupt function. You are already using a highly optimized version of the interrupt function, but the only functionality of the current interrupt is to get an accurate time synchronisation, i.e. you are waiting for the Timer2 to expire. With a slightly different approach you can optimize the interrupt handler away.
Everytime when a timer overflows it will set the corresponding Peripheral Interrupt Request Flag (PIR), this is regardless of their corresponding Interrupt Enable mask bit. Using this knowledge you can have the Timer2 interrupt disabled but still have your main loop test for the timer PIR flag being set:
Code: | #byte PIR1 = 0x0C
#bit TMR2IF = PIR1.1
void main()
{
setup_timer2(...);
disable_interrupt(INT_TIMER2); // Note the _disabling_ of the interrupt.
clear_interrupt(INT_TIMER2);
...
for (...)
{
Write10bitPWM(sample);
while (TMR2IF == 0)
{}; // Wait until Timer2 overflows
TMR2IF = 0; // Reset Timer2 overflow flag. Alternatively use clear_interrupt(INT_TIMER2);
...
}
}
// Note that there is no interrupt handler function anymore |
|
|
|
andyd
Joined: 08 Mar 2007 Posts: 30
|
|
Posted: Mon Apr 09, 2007 7:00 am |
|
|
Apologies for the HTML thing, completely forgot!
Here's the main routine again:
Code: |
prevsample = 0; // Clear ADPCM previous sample
previndex = 0; // Clear ADPCM previous index
index = 0;
diffq = 0;
step = StepSizeTable[index];
address_upper = 0b00000000;
address_lower = 0b00000001;
fputs("Press button to start playback.", PC);
while(input(PIN_B3)) // Wait for button press
{
}
for(i=0; i<8000; i++)
{
wait = 1;
code = eeprom_read(address_upper, address_lower);
address_lower++; // Add 1 to lower address byte
if(address_lower == 0x00) address_upper++; // If lower address = 0, add 1 to upper address byte
sample = ADPCMDecoder((code>>4) & 0x0f); // Decode upper half of byte
Write10bitPWM(sample);
while(wait)
{
}
wait = 1;
sample = ADPCMDecoder(code & 0x0f); // Decode lower half of byte
Write10bitPWM(sample);
while(wait)
{
}
} |
The EEPROM is a Microchip 24AA1025, I2C setup line is:
Code: | #use I2C(Master, sda = PIN_B1, scl = PIN_B4, FAST) |
I'm aware that the EEPROM has a page read feature, but if I read a whole page at a time, do I not need a large array to store it in? My PIC doesn't have a huge amount of RAM and the compiler normally throws a wobbly when I declare an array of anything in the region of about 100 bytes... |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Mon Apr 09, 2007 3:00 pm |
|
|
Quote: | I'm aware that the EEPROM has a page read feature, but if I read a whole page at a time, do I not need a large array to store it in? | The EEPROM has a page write and a sequential read feature. For the sequential read it is not required to have a large buffer in the PIC, you can just sequentially read a byte at a time when you need it. |
|
|
andyd
Joined: 08 Mar 2007 Posts: 30
|
|
Posted: Sun Apr 15, 2007 10:02 am |
|
|
Ah, ok. Well I've now implemented a sequential read and am now only outputting 8 bits to the PWM (removes a couple of extra bitshifts and AND functions), but it's still not quite fast enough. If I give it a file which contains a 1 kHz sine wave and look at the output of the filter on a scope, I see a 615-ish Hz sine.
Any other ideas on making it faster? I did think about pre-buffering part of the compressed audio (as I assume reading from the PIC's RAM will be faster than an external EEPROM), but the files are in the order of a couple of kB each, which is too big for the PIC's RAM, so I'd still have to be reading data from the EEPROM into the buffer while the decode routine was happening and so don't think it'd help?
Any other suggestions on making it faster or am I just stuck with slow audio unless I use a faster clock frequency? |
|
|
Hans Wedemeyer
Joined: 15 Sep 2003 Posts: 226
|
Inline this |
Posted: Sun Apr 15, 2007 11:36 am |
|
|
If you have code space then inline
Write10bitPWM(sample);
This avoids pushing and popping the stack.
I have similar code that ticks at 15uS but running at 40MHz on a PIC18 !
Not only will the PIC18 give you a faster clock in lots of chips there is more
RAM is you need it.
You may find a pin for pin compatible PIC18 to replace the PIC16 |
|
|
andyd
Joined: 08 Mar 2007 Posts: 30
|
|
Posted: Sun Apr 15, 2007 11:46 am |
|
|
Could you explain what you mean by inline? |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Sun Apr 15, 2007 2:30 pm |
|
|
andyd wrote: | Could you explain what you mean by inline? | Check your C-manual for the #inline statement. On compiling code the compiler often has to decide to optimize for speed or for code size, using the #inline directive you tell the compiler that for the indicated function speed is important. An inline function can be executed faster because it is located directly at every program location where it is called, this saves storing variables at the stack, a goto and a return instruction. Disadvantage is the increased code space required as the function has to be copied 'in line' at every location where it is called.
The opposite of #inline is the #separate directive, this is the default compiler setting.
Quote: | Any other suggestions on making it faster or am I just stuck with slow audio unless I use a faster clock frequency? | In your main loop you are setting wait=1 twice, this is dangerous as this variable might already have been cleared in the interrupt routine.
Decoding 8000 samples with a PIC16 processor running at 8MHz leaves you with only 250 instruction times per sample. This is tight but should be possible.
Giving you some general advice is like shooting from the hip. Much better is when you do some measurements on your code in order to _know_ where the problems are. For example in MPLAB you can execute your program in the simulator and then use MPLAB's stopwatch function to meassure the time used by each function. I don't think it is possible to simulate the eeprom, but you can use an oscilloscope to meassure the real hardware and than replace the eeprom_read() by a stub function with an equal delay_us(). |
|
|
|