|
|
View previous topic :: View next topic |
Author |
Message |
DDDDaniel Guest
|
Optimizations for SD-Card Access |
Posted: Tue Jan 29, 2008 3:27 pm |
|
|
Hi!
I'm currently working on some lowlevel driver functions for MMC/SD-card access. My current version only achieves a transfere rate of about 260kB/s when reading a large number of successive sectors from a microSD-card which - in my opinion - is quite slow... (PIC18F67J50 @ 48MHz, 12MHz SPI-Freq.)
Here is my code:
Code: |
unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
{
// Setup Command sequence for Block Read
unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF};
unsigned int res;
long a;
// insert address bytes into command sequence
addr = addr << 9; //addr = addr * 512
cmd[1] = make8(addr,3);
cmd[2] = make8(addr,2);
cmd[3] = make8(addr,1);
// chip select MMC/SD-card
output_low(CS_CARD);
// send 8 clock pulses
spi_write(0xFF);
//send 6 Byte Command
for (res = 0;res<0x06;res++)
spi_write(cmd[res]);
// Wait for a valid response from the MMC/SD-card
while (spi_read(0xFF) == 0xff);
// Wait for Start Byte (FEh/Start Byte)
while (spi_read(0xFF) != 0xfe);
// Read Sector (usually 512 Bytes) from MMC/SD-card
for (a=0;a<512;a++)
*Buffer++ = spi_read(0xFF);
// Read CRC-Bytes
spi_read(0xff);
spi_read(0xff);
// disable MMC/SD-card
output_high(CS_CARD);
return(0);
}
|
The list-file looks like this:
Code: |
.................... unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
.................... {
.................... // Setup Command sequence for Block Read
.................... unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF};
*
0A4F6: MOVLW 51
0A4F8: MOVLB 9
0A4FA: MOVWF xF8
0A4FC: CLRF xF9
0A4FE: CLRF xFA
0A500: CLRF xFB
0A502: CLRF xFC
0A504: MOVLW FF
0A506: MOVWF xFD
.................... unsigned int res;
.................... long a;
....................
.................... // insert address bytes into command sequence
.................... addr = addr << 9; //addr = addr * 512
0A508: BCF FD8.0
0A50A: MOVFF 9F4,9F5
0A50E: MOVFF 9F3,9F4
0A512: MOVFF 9F2,9F3
0A516: CLRF xF2
0A518: RLCF xF3,F
0A51A: RLCF xF4,F
0A51C: RLCF xF5,F
.................... cmd[1] = make8(addr,3);
0A51E: MOVFF 9F5,9F9
.................... cmd[2] = make8(addr,2);
0A522: MOVFF 9F4,9FA
.................... cmd[3] = make8(addr,1);
0A526: MOVFF 9F3,9FB
....................
.................... // chip select MMC/SD-card
.................... output_low(CS_CARD);
0A52A: BCF F8F.1
....................
.................... // send 8 clock pulses
.................... spi_write(0xFF);
0A52C: MOVF FC9,W
0A52E: MOVLW FF
0A530: MOVWF FC9
0A532: BTFSS FC7.0
0A534: BRA A532
....................
.................... //send 6 Byte Command
.................... for (res = 0;res<0x06;res++)
0A536: CLRF xFE
0A538: MOVF xFE,W
0A53A: SUBLW 05
0A53C: BNC A55E
.................... spi_write(cmd[res]);
0A53E: CLRF 03
0A540: MOVF xFE,W
0A542: ADDLW F8
0A544: MOVWF FE9
0A546: MOVLW 09
0A548: ADDWFC 03,W
0A54A: MOVWF FEA
0A54C: MOVFF FEF,A01
0A550: MOVF FC9,W
0A552: MOVFF A01,FC9
0A556: BTFSS FC7.0
0A558: BRA A556
0A55A: INCF xFE,F
0A55C: BRA A538
....................
.................... // Wait for a valid response from the MMC/SD-card
.................... while (spi_read(0xFF) == 0xff);
0A55E: MOVF FC9,W
0A560: MOVLW FF
0A562: MOVWF FC9
0A564: BTFSS FC7.0
0A566: BRA A564
0A568: INCFSZ FC9,W
0A56A: BRA A56E
0A56C: BRA A55E
....................
.................... // Wait for Start Byte (FEh/Start Byte)
.................... while (spi_read(0xFF) != 0xfe);
0A56E: MOVF FC9,W
0A570: MOVLW FF
0A572: MOVWF FC9
0A574: BTFSS FC7.0
0A576: BRA A574
0A578: MOVF FC9,W
0A57A: SUBLW FE
0A57C: BNZ A56E
....................
.................... // Read Sector (usually 512 Bytes) from MMC/SD-card
.................... for (a=0;a<512;a++)
0A57E: MOVLB A
0A580: CLRF x00
0A582: MOVLB 9
0A584: CLRF xFF
0A586: MOVLB A
0A588: MOVF x00,W
0A58A: SUBLW 01
0A58C: BNC A5C0
.................... *Buffer++ = spi_read(0xFF);
0A58E: MOVLB 9
0A590: MOVFF 9F7,03
0A594: MOVF xF6,W
0A596: INCF xF6,F
0A598: BTFSC FD8.2
0A59A: INCF xF7,F
0A59C: MOVWF FE9
0A59E: MOVFF 03,FEA
0A5A2: MOVF FC9,W
0A5A4: MOVLW FF
0A5A6: MOVWF FC9
0A5A8: BTFSS FC7.0
0A5AA: BRA A5A8
0A5AC: MOVFF FC9,FEF
0A5B0: INCF xFF,F
0A5B2: BTFSS FD8.2
0A5B4: BRA A5BC
0A5B6: MOVLB A
0A5B8: INCF x00,F
0A5BA: MOVLB 9
0A5BC: BRA A586
0A5BE: MOVLB A
....................
.................... // Read CRC-Bytes
.................... spi_read(0xff);
0A5C0: MOVF FC9,W
0A5C2: MOVLW FF
0A5C4: MOVWF FC9
0A5C6: BTFSS FC7.0
0A5C8: BRA A5C6
.................... spi_read(0xff);
0A5CA: MOVF FC9,W
0A5CC: MOVLW FF
0A5CE: MOVWF FC9
0A5D0: BTFSS FC7.0
0A5D2: BRA A5D0
....................
.................... // disable MMC/SD-card
.................... output_high(CS_CARD);
0A5D4: BSF F8F.1
....................
.................... return(0);
0A5D6: MOVLW 00
0A5D8: MOVWF 01
.................... }
0A5DA: MOVLB 0
0A5DC: RETLW 00
|
Now if you take a look at the builtin spi functions in particular the compiler produces the following:
Code: |
.................... spi_write(0xFF);
0A52C: MOVF FC9,W
0A52E: MOVLW FF
0A530: MOVWF FC9
0A532: BTFSS FC7.0
0A534: BRA A532
|
Shouldn't we better check the Buffer-Full bit (FC7.0) before we move the Byte into the Buffer (FC9)? In the above example I loose many clocks by waiting for the Byte to be sent through the SPI, instead we could do something useful.
I tried to replace the builtin spi function by my own code...
Code: |
#byte SSP1STAT = 0xFC7
#byte SSP1BUF = 0xFC9
#inline
void myspi_write(char val)
{
while(bit_test(SSP1STAT,0));
SSP1BUF = val;
}
|
...but for some strange reason, if I use the function myspi_write(0xFF) in the read_sector(..) function, it does not get inserted by the compiler correctly, but it produces the following:
Code: |
.
.
.
.................... myspi_write(0xFF);
0A52C: MOVLW FF
0A52E: MOVLB A
0A530: MOVWF x02
....................
.................... //send 6 Byte Command
.................... for (res = 0;res<0x06;res++)
*
0A53A: MOVLB 9
0A53C: CLRF xFE
.
.
.
|
What's going on here?
Thanks for any help on getting the code faster!
Best regards,
Daniel
btw.: compiler version: 4.049 |
|
|
Neutone
Joined: 08 Sep 2003 Posts: 839 Location: Houston
|
|
|
DDDDaniel Guest
|
|
Posted: Wed Jan 30, 2008 3:54 am |
|
|
Hey Neutone,
thanks for the link, it confirms my impression of the inefficient CCS spi functions.
But now I've got another question. When I use Ttelmah's SPI functions...
Code: |
//For PIC18 chips. Will need to change for others
#byte SSPBUF = 0xFC9
#byte SSPCON = 0xFC6
#byte SSPSTAT = 0xFC7
#bit BF = SSPSTAT.0
/* Now the SSP handler code. Using my own, since the supplied routines test the wrong way round for interrupt driven slave operations... */
#DEFINE READ_SSP() (SSPBUF)
#DEFINE WAIT_FOR_SSP() while(!BF)
#DEFINE WRITE_SSP(x) SSPBUF=(x)
#DEFINE CLEAR_WCOL() SSPCON=SSPCON & 0x3F
|
...how do I produce a function like read_spi(0xFF), where the value 0xFF is clocked out while the value is read from the slave? (the mmc/sd-card spi interface requires this)
best regards,
daniel |
|
|
metalm
Joined: 22 Mar 2007 Posts: 23 Location: Buenos Aires, Argentina
|
|
Posted: Wed Jan 30, 2008 5:20 am |
|
|
Helo! why you don't use the card in the 4-wire mode? I was crazy for making this thing but i take almost 2 mega-bytes per second with pic24H @ 40 MIPS, this is my code, i hope this to be useful for you!
Code: |
#include <24HJ12GP202.h>
#include "main.h"
#use fast_io(A)
#use fast_io(B)
#define MODULOS 3
#use delay(clock=80000000)
// Variables globales
unsigned char data[5]; // Guarda el comando para calcular el CRC
unsigned char resp[17]; // Guarda la respuesta de cada comando
unsigned char rca[2]; // Guarda RELATIVE CARD ADDRESS
unsigned char _SPI_READ(void) {
unsigned char SPISR;
#locate SPISR = 0xBFE
TRISSPI = 1;
SPISR = 0;
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #7
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #6
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #5
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #4
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #3
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #2
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #1
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
delay_cycles(2);
#asm
btsc 0x2CA, #2
bset 0xBFE, #0
#endasm
LAT_CLK = 0;
delay_cycles(2);
return(SPISR);
}
void _SPI_WRITE(unsigned char SPITX) {
#locate SPITX = 0xBFC
TRISSPI = 0;
#asm
btsc 0xBFC, #7
bset 0x2CC, #2
btss 0xBFC, #7
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #6
bset 0x2CC, #2
btss 0xBFC, #6
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #5
bset 0x2CC, #2
btss 0xBFC, #5
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #4
bset 0x2CC, #2
btss 0xBFC, #4
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #3
bset 0x2CC, #2
btss 0xBFC, #3
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #2
bset 0x2CC, #2
btss 0xBFC, #2
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #1
bset 0x2CC, #2
btss 0xBFC, #1
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
#asm
btsc 0xBFC, #0
bset 0x2CC, #2
btss 0xBFC, #0
bclr 0x2CC, #2
#endasm
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
TRISSPI = 1;
}
unsigned char _crc7(void) {
unsigned int16 i, a;
unsigned char crc,aux;
crc = 0;
for(a=0; a<5; a++) {
aux = data[a];
for(i=0; i<8; i++) {
crc <<= 1;
if((aux & 0x80)^(crc & 0x80))
crc ^=0x09;
aux <<= 1;
}
}
crc=(crc<<1)|1; // agrega stop bit
return(crc);
}
void _sd_clocks(unsigned int8 clocks) { // enviar clocks maximo, sale si llega start bit
unsigned int16 i;
TRISSPI = 1;
for(i = 0; i < clocks; i++) {
LAT_CLK = 1;
delay_cycles(5);
if(~IN_SPI) // sale si recibe start bit
break;
LAT_CLK = 0;
delay_cycles(5);
}
}
void _sd_cmd(void) {
_SPI_WRITE(data[0]); // envia comando a la tarjeta
_SPI_WRITE(data[1]);
_SPI_WRITE(data[2]);
_SPI_WRITE(data[3]);
_SPI_WRITE(data[4]);
_SPI_WRITE(_crc7()); // calcula CRC7 y lo envia
}
void _acmd(unsigned char acmd) {
unsigned int16 i;
data[0]=0x77, data[1]=rca[0], data[2]=rca[1], data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD55
if(acmd == 41)
_sd_clocks(5); // Nid Cycles
else
_sd_clocks(64); // Ncr Cycles
for(i = 0; i < 6; i++) { // recibe R1
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
switch(acmd) {
case 6:
data[0]=0x46, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x02;
_sd_cmd(); // envia ACMD6
_sd_clocks(64); // Ncr Cycles
for(i = 0; i < 6; i++) { // recibe R1
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 41:
data[0]=0x69, data[1]=0x00, data[2]=0xFF, data[3]=0x80, data[4]=0x00;
_sd_cmd(); // envia ACMD41
_sd_clocks(5); // Nid Cycles
for(i = 0; i < 6; i++) { // recibe R3
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 42:
data[0]=0x6A, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia ACMD42
_sd_clocks(64); // Ncr Cycles
for(i = 0; i < 6; i++) { // recibe R1
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
}
}
void _cmd(unsigned char cmd) {
unsigned int16 i;
switch(cmd) {
case 0:
data[0]=0x40, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD0
_sd_clocks(8); // 8 clocks adicionales despues de stop bit
break;
case 2:
data[0]=0x42, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD2
_sd_clocks(5); // Nid Cycles
for(i = 0; i < 17; i++) { // recibe R2
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 3:
data[0]=0x43, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD3
_sd_clocks(64); // Ncr cycles (maximo)
for(i = 0; i < 6; i++) { // recibe R6
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 7:
data[0]=0x47, data[1]=rca[0], data[2]=rca[1], data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD7
_sd_clocks(64); // Ncr cycles (maximo)
for(i = 0; i < 6; i++) { // recibe R1b
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 12:
data[0]=0x4C, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD12
_sd_clocks(64); // Ncr cycles (maximo)
for(i = 0; i < 6; i++) { // recibe R1b
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 16:
data[0]=0x50, data[1]=0x00, data[2]=0x00, data[3]=0x02, data[4]=0x00;
_sd_cmd(); // envia CMD16
_sd_clocks(64); // Ncr cycles (maximo)
for(i = 0; i < 6; i++) { // recibe R1
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
case 18:
data[0]=0x52, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
_sd_cmd(); // envia CMD18
_sd_clocks(64); // Ncr cycles (maximo)
for(i = 0; i < 6; i++) { // recibe R1
resp[i] = _SPI_READ();
}
_sd_clocks(8); // 8 clocks adicionales despues de la respuesta
break;
}
}
void _sd_init(void) {
rca[0] = 0; // Inicializa direccion relativa en cero
rca[1] = 0;
_sd_clocks(74); // envia 74 ciclos de clock para inicializar
_cmd(0); // Go to idle state
bit_clear(resp[1], 7);
while(~bit_test(resp[1], 7)) { // pooling busy flag
_acmd(41);
delay_ms(50);
}
_cmd(2); // Recibe CID
_cmd(3); // Recibe RCA
rca[0] = resp[1]; // Almacena RCA (MSB)
rca[1] = resp[2]; // Almacena RCA (LSB)
_cmd(7); // Card selection
_cmd(16); // Set block length (512 bytes)
_acmd(42); // Disable PULL-UP in D3
_acmd(6); // Wide bus (4-bit)
}
void _sd_multiread(void) {
unsigned int i, frames;
unsigned int8 addr=1; // Direccion del modulo a enviar
#locate addr = 0xBFA
_cmd(18);
while(IN_D0) {// Espera start-bit en la linea de DATOS
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
}
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
for(frames = 0; frames < 66; frames++) {
if(addr > MODULOS) {
addr = 1; // Resetea direccion
}
delay_cycles(1);
#asm
MOV.B #254, WREG // Envia cabezal
CLR.B 0x2CD
IOR.b 0x2CD
#endasm
LAT_CLK_OUT = 0;
delay_us(1);
LAT_CLK_OUT = 1;
delay_us(1);
#asm
MOV.B 0xBFA, WREG // Envia direccion
CLR.B 0x2CD
IOR.b 0x2CD
#endasm
addr++; // Incrementa direccion
LAT_CLK_OUT = 0;
delay_us(1);
LAT_CLK_OUT = 1;
for(i = 0; i < 512; i++) {
LAT_CLK_OUT = 1;
LAT_CLK = 1;
#asm
MOV 0x2C2, W0
SWAP.B WREG
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
#asm
IOR 0x2C2, WREG
CLR.B 0x2CD
IOR.b 0x2CD
#endasm
LAT_CLK = 0;
LAT_CLK_OUT = 0;
delay_cycles(4);
}
LAT_CLK_OUT = 1;
for(i = 0; i < 17; i++) { // Ignora bits de checksum y stop bit
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
}
while(IN_D0) {// Espera start-bit en la linea de DATOS
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
}
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
for(i = 0; i < 512; i++) {
LAT_CLK_OUT = 1;
LAT_CLK = 1;
#asm
MOV 0x2C2, W0
SWAP.B WREG
#endasm
LAT_CLK = 0;
delay_cycles(2);
LAT_CLK = 1;
#asm
IOR 0x2C2, WREG
CLR.B 0x2CD
IOR.b 0x2CD
#endasm
LAT_CLK = 0;
LAT_CLK_OUT = 0;
delay_cycles(4);
}
LAT_CLK_OUT = 1;
for(i = 0; i < 17; i++) { // Ignora bits de checksum y stop bit
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
}
while(IN_D0) {// Espera start-bit en la linea de DATOS
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
}
LAT_CLK = 1;
delay_cycles(2);
LAT_CLK = 0;
delay_cycles(2);
delay_ms(12);// PRUEBA
}
_cmd(12);
}
void main(void) {
TRISA = 0x001F;
TRISB = 0x0000;
ODCA = 0;
ODCB = 0;
LATA = 0; // Pone a tierra las patas no utilizadas
LATB = 0; // Pone a tierra las patas no utilizadas
CN2PUE = 1; // Habilita Pull-UP en linea de datos
CN3PUE = 1; // Habilita Pull-UP en linea de datos
CN29PUE = 1; // Habilita Pull-UP en linea de datos
CN30PUE = 1; // Habilita Pull-UP en linea de datos
// Internal FRC, 80MHz
PLLDIV = 0x0097; // Multiplica por 153
CLKDIV = 0x0005; // PLLPRE divide por 7, PLLPOST multiplica por 2
LAT_CLK_OUT = 1;
_sd_init();
LAT_LED = 1;
for(;;) {
_sd_multiread();
delay_ms(500);
}
}
|
Last edited by metalm on Wed Feb 13, 2008 7:49 am; edited 1 time in total |
|
|
metalm
Joined: 22 Mar 2007 Posts: 23 Location: Buenos Aires, Argentina
|
|
Posted: Wed Jan 30, 2008 5:21 am |
|
|
well, this appeared without tabs |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Wed Jan 30, 2008 5:52 am |
|
|
You need to have "BBCode" enabled, if you want the code block to
appear. You can edit your post, and remove the check from the
tickbox for "Disable BBCode in this post".
You can also edit your profile and always enable BBcode. That's the
best way. |
|
|
DDDDaniel Guest
|
|
Posted: Wed Jan 30, 2008 10:19 am |
|
|
@metalm:
thanks for your input, but I can't change the hardware anymore and have to live with the standard SPI interface now... But I'll try to use the 4-wire interface in my next project |
|
|
Ttelmah Guest
|
|
Posted: Wed Jan 30, 2008 10:53 am |
|
|
On your question about the transfer, using my 'minimised' code. If the SPI is not busy, as soon as you write a byte into the output buffer, it starts to transfer. Wait for the busy flag to drop, and read the input buffer. this will contain the byte read back.
Best Wishes |
|
|
DDDDaniel Guest
|
|
Posted: Wed Jan 30, 2008 2:50 pm |
|
|
Yeah, it works!
...but it does not really improve the performance... |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Wed Jan 30, 2008 8:49 pm |
|
|
The optimization of SPI_write() doesn't help a lot because it is executed only a few times.
The code for reading the data is executed 512 times and this is where optimization would help. Code: | for (a=0;a<512;a++)
*Buffer++ = spi_read(0xFF); | Roughly counting the instruction times in the listing file it looks like the loop takes about 35 instruction times, of which 8 are for transmitting the data over SPI.
A lot of code in this loop is used for calculating the address in memory where Buffer is pointing to. The CCS compiler is not very smart here as it calculates the address again and again in every iteration. Setting up the index register yourself and using the POSTINC register for reading the data you should get almost double speed. A nice feature of using the POSTINC register is that you nitialize the start address once and the PIC will increase the address for you on every access (there are also POSTDEC, PREINC and other register variations).
Code: | // Register defines for PIC18
// Place this at the global level, or in a separate include file with register definitions.
unsigned int16 FSR0;
#locate FSR0=0x0FE9
unsigned char POSTINC0;
#locate POSTINC0=0x0FEE |
Replace the loop above with the code below: Code: | FSR0 = Buffer; // Set start address in index register
for (a=0;a<512;a++)
POSTINC0 = spi_read(0xFF); // Write data to Buffer and increase address |
|
|
|
DDDDaniel Guest
|
|
Posted: Thu Jan 31, 2008 12:51 am |
|
|
Thanks, ckielstra,
your code really improves performance!
The new values:
Read SD: 312kB/s
Write SD: 216kB/s
And that's the current code:
Code: |
.................... unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
.................... {
.................... // Setup Command sequence for Block Read
.................... unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF};
*
0A31C: MOVLW 51
0A31E: MOVLB 9
0A320: MOVWF xF8
0A322: CLRF xF9
0A324: CLRF xFA
0A326: CLRF xFB
0A328: CLRF xFC
0A32A: MOVLW FF
0A32C: MOVWF xFD
.................... unsigned int res,b;
.................... long a;
....................
.................... // insert address bytes into command sequence
.................... addr = addr << 9; //addr = addr * 512
0A32E: BCF FD8.0
0A330: MOVFF 9F4,9F5
0A334: MOVFF 9F3,9F4
0A338: MOVFF 9F2,9F3
0A33C: CLRF xF2
0A33E: RLCF xF3,F
0A340: RLCF xF4,F
0A342: RLCF xF5,F
.................... cmd[1] = make8(addr,3);
0A344: MOVFF 9F5,9F9
.................... cmd[2] = make8(addr,2);
0A348: MOVFF 9F4,9FA
.................... cmd[3] = make8(addr,1);
0A34C: MOVFF 9F3,9FB
....................
.................... // chip select MMC/SD-card
.................... output_low(CS_CARD);
0A350: BCF F8F.1
....................
.................... // send 8 clock pulses
.................... //spi_write(0xFF);
.................... WRITE_SSP(0xFF);
0A352: MOVLW FF
0A354: MOVWF FC9
.................... WAIT_FOR_SSP();
0A356: BTFSS FC7.0
0A358: BRA A356
....................
.................... for (res = 0;res<0x06;res++)
0A35A: CLRF xFE
0A35C: MOVF xFE,W
0A35E: SUBLW 05
0A360: BNC A37C
.................... {
.................... WRITE_SSP(cmd[res]);
0A362: CLRF 03
0A364: MOVF xFE,W
0A366: ADDLW F8
0A368: MOVWF FE9
0A36A: MOVLW 09
0A36C: ADDWFC 03,W
0A36E: MOVWF FEA
0A370: MOVFF FEF,FC9
.................... WAIT_FOR_SSP();
0A374: BTFSS FC7.0
0A376: BRA A374
.................... }
0A378: INCF xFE,F
0A37A: BRA A35C
....................
....................
.................... // Wait for a valid response from the MMC/SD-card
.................... //while (spi_read(0xFF) == 0xff);
.................... do
.................... {
.................... WRITE_SSP(0xFF);
0A37C: MOVLW FF
0A37E: MOVWF FC9
.................... WAIT_FOR_SSP();
0A380: BTFSS FC7.0
0A382: BRA A380
.................... }
.................... while(READ_SSP() == 0xFF);
0A384: INCFSZ FC9,W
0A386: BRA A38A
0A388: BRA A37C
....................
.................... // Wait for Start Byte (FEh/Start Byte)
.................... //while (spi_read(0xFF) != 0xfe);
.................... do
.................... {
.................... WRITE_SSP(0xFF);
0A38A: MOVLW FF
0A38C: MOVWF FC9
.................... WAIT_FOR_SSP();
0A38E: BTFSS FC7.0
0A390: BRA A38E
.................... }
.................... while(READ_SSP() != 0xFE);
0A392: MOVF FC9,W
0A394: SUBLW FE
0A396: BNZ A38A
....................
....................
.................... // Read Sector (usually 512 Bytes) from MMC/SD-card
....................
.................... FSR0 = Buffer;
0A398: MOVFF 9F7,FEA
0A39C: MOVFF 9F6,FE9
.................... for (a=0;a<512;a++)
0A3A0: MOVLB A
0A3A2: CLRF x01
0A3A4: CLRF x00
0A3A6: MOVF x01,W
0A3A8: SUBLW 01
0A3AA: BNC A3C0
.................... {
.................... WRITE_SSP(0xFF);
0A3AC: MOVLW FF
0A3AE: MOVWF FC9
.................... WAIT_FOR_SSP();
0A3B0: BTFSS FC7.0
0A3B2: BRA A3B0
.................... POSTINC0 = READ_SSP();
0A3B4: MOVFF FC9,FEE
.................... //*Buffer++ = READ_SSP();
.................... }
0A3B8: INCF x00,F
0A3BA: BTFSC FD8.2
0A3BC: INCF x01,F
0A3BE: BRA A3A6
....................
.................... // Read CRC-Bytes
.................... // spi_read(0xff);
.................... // spi_read(0xff);
....................
.................... WRITE_SSP(0xFF);
0A3C0: MOVLW FF
0A3C2: MOVWF FC9
.................... WAIT_FOR_SSP();
0A3C4: BTFSS FC7.0
0A3C6: BRA A3C4
.................... WRITE_SSP(0xFF);
0A3C8: MOVLW FF
0A3CA: MOVWF FC9
.................... WAIT_FOR_SSP(); // let's try to skip the wait cycle here
0A3CC: BTFSS FC7.0
0A3CE: BRA A3CC
....................
.................... // disable MMC/SD-card
.................... output_high(CS_CARD);
0A3D0: BSF F8F.1
....................
.................... return(0);
0A3D2: MOVLW 00
0A3D4: MOVWF 01
.................... }
0A3D6: MOVLB 0
0A3D8: RETLW 00
|
best regards,
daniel |
|
|
Ttelmah Guest
|
|
Posted: Thu Jan 31, 2008 6:12 am |
|
|
As one possible further change, consider modifying the main 'read' loop, with:
Code: |
FSR0 = Buffer;
WRITE_SSP(0xFF);
for (a=0;a<511;a++)
{
WAIT_FOR_SSP();
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
}
WAIT_FOR_SSP();
POSTINC0 = READ_SSP();
|
Looks 'daft' (adds three more instructions), but the key is that for 511 times round the loop, the SSP transfer,will actually occur, _while_ the loop count is being incremented, and tested, so far less time will be needed in the 'wait'. It ought to boost performance by aother few percent.
This is the whole 'key point' about using my 'split' functions, in that it potentially allows you to start sending the byte, and do other things before testing whether the transfer has finished.
Best Wishes |
|
|
Guest
|
|
Posted: Thu Jan 31, 2008 8:13 am |
|
|
@Ttelmah:
Thanks, this might save me another few clocks!
Something else I recently tried was the following, to avoid having to deal with a 16 bit variable:
Code: |
unsigned int a;
FSR0 = Buffer;
WRITE_SSP(0xFF);
for (a=0;a<255;a++)
{
WAIT_FOR_SSP();
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
WAIT_FOR_SSP();
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
}
|
But for some strange reason it does not work...
Do you know what I'm doing wrong and if it might increase the transfere rate furthermore?
thanks,
daniel |
|
|
Ttelmah Guest
|
|
Posted: Thu Jan 31, 2008 9:54 am |
|
|
I'd think this would be one transfer 'short'. You 'prestart' one 8bit transfer, then perform 254*2 transfers. Even if you add one 'post transfer' as in my code, you are one byte short.
Best Wishes |
|
|
Neutone
Joined: 08 Sep 2003 Posts: 839 Location: Houston
|
|
Posted: Thu Jan 31, 2008 12:43 pm |
|
|
I think the fastest you can get will be something like this. Inline code could be slightly faster than the loop. The time to transfer one byte is 8 instruction cycles when SPI_CLK_DIV_4 is used. Using this code should give you just over one instruction time between byte reads. The first delay will have to be tweeked to insure 8 instruction cycles in processing the loop. I would guess this to be about a 30-40% speed boost compared to checking the GO_DONE bit.
Code: |
unsigned int a;
FSR0 = Buffer;
for (a=128;a>0;a--)
{
Delay_Cycles(2);
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
Delay_Cycles(7);
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
Delay_Cycles(7);
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
Delay_Cycles(7);
POSTINC0 = READ_SSP();
WRITE_SSP(0xFF);
}
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|