CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to support@ccsinfo.com

Optimizations for SD-Card Access
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
DDDDaniel
Guest







Optimizations for SD-Card Access
PostPosted: Tue Jan 29, 2008 3:27 pm     Reply with quote

Hi!

I'm currently working on some lowlevel driver functions for MMC/SD-card access. My current version only achieves a transfere rate of about 260kB/s when reading a large number of successive sectors from a microSD-card which - in my opinion - is quite slow... (PIC18F67J50 @ 48MHz, 12MHz SPI-Freq.)

Here is my code:
Code:

unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
{   
   // Setup Command sequence for Block Read
   unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF};
   unsigned int res;   
   long a;

   // insert address bytes into command sequence 
   addr = addr << 9; //addr = addr * 512
   cmd[1] = make8(addr,3);
   cmd[2] = make8(addr,2);
   cmd[3] = make8(addr,1);

   // chip select MMC/SD-card
   output_low(CS_CARD);

   // send 8 clock pulses
   spi_write(0xFF);

   //send 6 Byte Command
   for (res = 0;res<0x06;res++)
         spi_write(cmd[res]);

   // Wait for a valid response from the MMC/SD-card
   while (spi_read(0xFF) == 0xff);   

   // Wait for Start Byte (FEh/Start Byte)
   while (spi_read(0xFF) != 0xfe);

   // Read Sector (usually 512 Bytes) from MMC/SD-card
   for (a=0;a<512;a++)
      *Buffer++ = spi_read(0xFF);

   // Read CRC-Bytes
   spi_read(0xff);
   spi_read(0xff);
   
   // disable MMC/SD-card
   output_high(CS_CARD);

   return(0);
}


The list-file looks like this:
Code:

.................... unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
.................... {   
....................    // Setup Command sequence for Block Read
....................    unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF}; 
*
0A4F6:  MOVLW  51
0A4F8:  MOVLB  9
0A4FA:  MOVWF  xF8
0A4FC:  CLRF   xF9
0A4FE:  CLRF   xFA
0A500:  CLRF   xFB
0A502:  CLRF   xFC
0A504:  MOVLW  FF
0A506:  MOVWF  xFD
....................    unsigned int res;   
....................    long a;
.................... 
....................    // insert address bytes into command sequence   
....................    addr = addr << 9; //addr = addr * 512
0A508:  BCF    FD8.0
0A50A:  MOVFF  9F4,9F5
0A50E:  MOVFF  9F3,9F4
0A512:  MOVFF  9F2,9F3
0A516:  CLRF   xF2
0A518:  RLCF   xF3,F
0A51A:  RLCF   xF4,F
0A51C:  RLCF   xF5,F
....................    cmd[1] = make8(addr,3);
0A51E:  MOVFF  9F5,9F9
....................    cmd[2] = make8(addr,2);
0A522:  MOVFF  9F4,9FA
....................    cmd[3] = make8(addr,1);
0A526:  MOVFF  9F3,9FB
.................... 
....................    // chip select MMC/SD-card
....................    output_low(CS_CARD);
0A52A:  BCF    F8F.1
.................... 
....................    // send 8 clock pulses
....................    spi_write(0xFF);
0A52C:  MOVF   FC9,W
0A52E:  MOVLW  FF
0A530:  MOVWF  FC9
0A532:  BTFSS  FC7.0
0A534:  BRA    A532
.................... 
....................    //send 6 Byte Command
....................    for (res = 0;res<0x06;res++)
0A536:  CLRF   xFE
0A538:  MOVF   xFE,W
0A53A:  SUBLW  05
0A53C:  BNC   A55E
....................          spi_write(cmd[res]);
0A53E:  CLRF   03
0A540:  MOVF   xFE,W
0A542:  ADDLW  F8
0A544:  MOVWF  FE9
0A546:  MOVLW  09
0A548:  ADDWFC 03,W
0A54A:  MOVWF  FEA
0A54C:  MOVFF  FEF,A01
0A550:  MOVF   FC9,W
0A552:  MOVFF  A01,FC9
0A556:  BTFSS  FC7.0
0A558:  BRA    A556
0A55A:  INCF   xFE,F
0A55C:  BRA    A538
.................... 
....................    // Wait for a valid response from the MMC/SD-card
....................    while (spi_read(0xFF) == 0xff);   
0A55E:  MOVF   FC9,W
0A560:  MOVLW  FF
0A562:  MOVWF  FC9
0A564:  BTFSS  FC7.0
0A566:  BRA    A564
0A568:  INCFSZ FC9,W
0A56A:  BRA    A56E
0A56C:  BRA    A55E
.................... 
....................    // Wait for Start Byte (FEh/Start Byte)
....................    while (spi_read(0xFF) != 0xfe);
0A56E:  MOVF   FC9,W
0A570:  MOVLW  FF
0A572:  MOVWF  FC9
0A574:  BTFSS  FC7.0
0A576:  BRA    A574
0A578:  MOVF   FC9,W
0A57A:  SUBLW  FE
0A57C:  BNZ   A56E
.................... 
....................    // Read Sector (usually 512 Bytes) from MMC/SD-card
....................    for (a=0;a<512;a++)
0A57E:  MOVLB  A
0A580:  CLRF   x00
0A582:  MOVLB  9
0A584:  CLRF   xFF
0A586:  MOVLB  A
0A588:  MOVF   x00,W
0A58A:  SUBLW  01
0A58C:  BNC   A5C0
....................       *Buffer++ = spi_read(0xFF);
0A58E:  MOVLB  9
0A590:  MOVFF  9F7,03
0A594:  MOVF   xF6,W
0A596:  INCF   xF6,F
0A598:  BTFSC  FD8.2
0A59A:  INCF   xF7,F
0A59C:  MOVWF  FE9
0A59E:  MOVFF  03,FEA
0A5A2:  MOVF   FC9,W
0A5A4:  MOVLW  FF
0A5A6:  MOVWF  FC9
0A5A8:  BTFSS  FC7.0
0A5AA:  BRA    A5A8
0A5AC:  MOVFF  FC9,FEF
0A5B0:  INCF   xFF,F
0A5B2:  BTFSS  FD8.2
0A5B4:  BRA    A5BC
0A5B6:  MOVLB  A
0A5B8:  INCF   x00,F
0A5BA:  MOVLB  9
0A5BC:  BRA    A586
0A5BE:  MOVLB  A
.................... 
....................    // Read CRC-Bytes
....................    spi_read(0xff);
0A5C0:  MOVF   FC9,W
0A5C2:  MOVLW  FF
0A5C4:  MOVWF  FC9
0A5C6:  BTFSS  FC7.0
0A5C8:  BRA    A5C6
....................    spi_read(0xff);
0A5CA:  MOVF   FC9,W
0A5CC:  MOVLW  FF
0A5CE:  MOVWF  FC9
0A5D0:  BTFSS  FC7.0
0A5D2:  BRA    A5D0
....................     
....................    // disable MMC/SD-card
....................    output_high(CS_CARD);
0A5D4:  BSF    F8F.1
.................... 
....................    return(0);
0A5D6:  MOVLW  00
0A5D8:  MOVWF  01
.................... }
0A5DA:  MOVLB  0
0A5DC:  RETLW  00


Now if you take a look at the builtin spi functions in particular the compiler produces the following:
Code:

....................    spi_write(0xFF);
0A52C:  MOVF   FC9,W
0A52E:  MOVLW  FF
0A530:  MOVWF  FC9
0A532:  BTFSS  FC7.0
0A534:  BRA    A532


Shouldn't we better check the Buffer-Full bit (FC7.0) before we move the Byte into the Buffer (FC9)? In the above example I loose many clocks by waiting for the Byte to be sent through the SPI, instead we could do something useful.

I tried to replace the builtin spi function by my own code...
Code:

#byte SSP1STAT  = 0xFC7
#byte SSP1BUF    = 0xFC9

#inline
void myspi_write(char val)
{
   while(bit_test(SSP1STAT,0));
   SSP1BUF = val;
}   

...but for some strange reason, if I use the function myspi_write(0xFF) in the read_sector(..) function, it does not get inserted by the compiler correctly, but it produces the following:

Code:

.
.
.
....................    myspi_write(0xFF);
0A52C:  MOVLW  FF
0A52E:  MOVLB  A
0A530:  MOVWF  x02
.................... 
....................    //send 6 Byte Command
....................    for (res = 0;res<0x06;res++)
*
0A53A:  MOVLB  9
0A53C:  CLRF   xFE
.
.
.


What's going on here?
Thanks for any help on getting the code faster!

Best regards,
Daniel

btw.: compiler version: 4.049
Neutone



Joined: 08 Sep 2003
Posts: 839
Location: Houston

View user's profile Send private message

PostPosted: Tue Jan 29, 2008 5:19 pm     Reply with quote

This thread should prove helpfull
Ttelmah's post show how to use the SPI hardware more efficiently than the CCS functions. The CCS functions are written to be effective not optimal.
http://www.ccsinfo.com/forum/viewtopic.php?t=32415&highlight=define+writessp++sspbuf
DDDDaniel
Guest







PostPosted: Wed Jan 30, 2008 3:54 am     Reply with quote

Hey Neutone,

thanks for the link, it confirms my impression of the inefficient CCS spi functions.

But now I've got another question. When I use Ttelmah's SPI functions...
Code:

//For PIC18 chips. Will need to change for others
#byte   SSPBUF = 0xFC9
#byte   SSPCON = 0xFC6
#byte   SSPSTAT = 0xFC7
#bit BF = SSPSTAT.0

/* Now the SSP handler code. Using my own, since the supplied routines test the wrong way round for interrupt driven slave operations... */
#DEFINE READ_SSP()   (SSPBUF)
#DEFINE   WAIT_FOR_SSP()   while(!BF)
#DEFINE   WRITE_SSP(x)   SSPBUF=(x)
#DEFINE   CLEAR_WCOL()   SSPCON=SSPCON & 0x3F


...how do I produce a function like read_spi(0xFF), where the value 0xFF is clocked out while the value is read from the slave? (the mmc/sd-card spi interface requires this)

best regards,
daniel
metalm



Joined: 22 Mar 2007
Posts: 23
Location: Buenos Aires, Argentina

View user's profile Send private message

PostPosted: Wed Jan 30, 2008 5:20 am     Reply with quote

Helo! why you don't use the card in the 4-wire mode? I was crazy for making this thing but i take almost 2 mega-bytes per second with pic24H @ 40 MIPS, this is my code, i hope this to be useful for you!

Code:

#include <24HJ12GP202.h>
#include "main.h"

#use fast_io(A)
#use fast_io(B)

#define MODULOS 3

#use delay(clock=80000000)

// Variables globales
unsigned char data[5]; // Guarda el comando para calcular el CRC
unsigned char resp[17]; // Guarda la respuesta de cada comando
unsigned char rca[2]; // Guarda RELATIVE CARD ADDRESS

unsigned char _SPI_READ(void) {
   unsigned char SPISR;
   #locate SPISR = 0xBFE
   TRISSPI = 1;
   SPISR = 0;
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #7
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #6
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #5
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #4
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #3
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #2
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #1
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   LAT_CLK = 1;
   delay_cycles(2);
   #asm
   btsc   0x2CA, #2
   bset   0xBFE, #0
   #endasm
   LAT_CLK = 0;
   delay_cycles(2);
   return(SPISR);
}

void _SPI_WRITE(unsigned char SPITX) {
   #locate SPITX = 0xBFC
   TRISSPI = 0;
   #asm
   btsc   0xBFC, #7
   bset   0x2CC, #2
   btss   0xBFC, #7
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #6
   bset   0x2CC, #2
   btss   0xBFC, #6
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #5
   bset   0x2CC, #2
   btss   0xBFC, #5
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #4
   bset   0x2CC, #2
   btss   0xBFC, #4
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #3
   bset   0x2CC, #2
   btss   0xBFC, #3
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #2
   bset   0x2CC, #2
   btss   0xBFC, #2
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #1
   bset   0x2CC, #2
   btss   0xBFC, #1
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   #asm
   btsc   0xBFC, #0
   bset   0x2CC, #2
   btss   0xBFC, #0
   bclr   0x2CC, #2
   #endasm
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   TRISSPI = 1;
}

unsigned char _crc7(void) {
   unsigned int16 i, a;
   unsigned char crc,aux;
   crc = 0;
   for(a=0; a<5; a++) {
      aux = data[a];
      for(i=0; i<8; i++) {
         crc <<= 1;
         if((aux & 0x80)^(crc & 0x80))
            crc ^=0x09;
      aux <<= 1;
      }
   }
   crc=(crc<<1)|1; // agrega stop bit
   return(crc);
}

void _sd_clocks(unsigned int8 clocks) { // enviar clocks maximo, sale si llega start bit
   unsigned int16 i;
   TRISSPI = 1;
   for(i = 0; i < clocks; i++) {
      LAT_CLK = 1;
      delay_cycles(5);
      if(~IN_SPI) // sale si recibe start bit
         break;
      LAT_CLK = 0;
      delay_cycles(5);
   }
}

void _sd_cmd(void) {
   _SPI_WRITE(data[0]); // envia comando a la tarjeta
   _SPI_WRITE(data[1]);
   _SPI_WRITE(data[2]);
   _SPI_WRITE(data[3]);
   _SPI_WRITE(data[4]);
   _SPI_WRITE(_crc7()); // calcula CRC7 y lo envia
}

void _acmd(unsigned char acmd) {
   unsigned int16 i;
   data[0]=0x77, data[1]=rca[0], data[2]=rca[1], data[3]=0x00, data[4]=0x00;
   _sd_cmd(); // envia CMD55
   if(acmd == 41)
      _sd_clocks(5); // Nid Cycles
   else
      _sd_clocks(64); // Ncr Cycles
   for(i = 0; i < 6; i++) { // recibe R1
      resp[i] = _SPI_READ();
   }
   _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
   switch(acmd) {
      case 6:
         data[0]=0x46, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x02;
         _sd_cmd(); // envia ACMD6
         _sd_clocks(64); // Ncr Cycles
         for(i = 0; i < 6; i++) { // recibe R1
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 41:
         data[0]=0x69, data[1]=0x00, data[2]=0xFF, data[3]=0x80, data[4]=0x00;
         _sd_cmd(); // envia ACMD41
         _sd_clocks(5); // Nid Cycles
         for(i = 0; i < 6; i++) { // recibe R3
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 42:
         data[0]=0x6A, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia ACMD42
         _sd_clocks(64); // Ncr Cycles
         for(i = 0; i < 6; i++) { // recibe R1
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
   }
}

void _cmd(unsigned char cmd) {
   unsigned int16 i;
   switch(cmd) {
      case 0:
         data[0]=0x40, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD0
         _sd_clocks(8); // 8 clocks adicionales despues de stop bit
         break;
      case 2:
         data[0]=0x42, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD2
         _sd_clocks(5); // Nid Cycles
         for(i = 0; i < 17; i++) { // recibe R2
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 3:
         data[0]=0x43, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD3
         _sd_clocks(64); // Ncr cycles (maximo)
         for(i = 0; i < 6; i++) { // recibe R6
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 7:
         data[0]=0x47, data[1]=rca[0], data[2]=rca[1], data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD7
         _sd_clocks(64); // Ncr cycles (maximo)
         for(i = 0; i < 6; i++) { // recibe R1b
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 12:
         data[0]=0x4C, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD12
         _sd_clocks(64); // Ncr cycles (maximo)
         for(i = 0; i < 6; i++) { // recibe R1b
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 16:
         data[0]=0x50, data[1]=0x00, data[2]=0x00, data[3]=0x02, data[4]=0x00;
         _sd_cmd(); // envia CMD16
         _sd_clocks(64); // Ncr cycles (maximo)
         for(i = 0; i < 6; i++) { // recibe R1
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
      case 18:
         data[0]=0x52, data[1]=0x00, data[2]=0x00, data[3]=0x00, data[4]=0x00;
         _sd_cmd(); // envia CMD18
         _sd_clocks(64); // Ncr cycles (maximo)
         for(i = 0; i < 6; i++) { // recibe R1
            resp[i] = _SPI_READ();
         }
         _sd_clocks(8); // 8 clocks adicionales despues de la respuesta
         break;
   }
}

void _sd_init(void) {
   rca[0] = 0; // Inicializa direccion relativa en cero
   rca[1] = 0;
   _sd_clocks(74); // envia 74 ciclos de clock para inicializar
   _cmd(0); // Go to idle state
   bit_clear(resp[1], 7);
   while(~bit_test(resp[1], 7)) { // pooling busy flag
      _acmd(41);
      delay_ms(50);
   }
   _cmd(2); // Recibe CID
   _cmd(3); // Recibe RCA
   rca[0] = resp[1]; // Almacena RCA (MSB)
   rca[1] = resp[2]; // Almacena RCA (LSB)
   _cmd(7); // Card selection
   _cmd(16); // Set block length (512 bytes)
   _acmd(42); // Disable PULL-UP in D3
   _acmd(6); // Wide bus (4-bit)
}

void _sd_multiread(void) {
   unsigned int i, frames;
   unsigned int8 addr=1; // Direccion del modulo a enviar
   #locate addr = 0xBFA
   _cmd(18);
   while(IN_D0) {// Espera start-bit en la linea de DATOS
      LAT_CLK = 1;
      delay_cycles(2);
      LAT_CLK = 0;
      delay_cycles(2);
   }
   LAT_CLK = 1;
   delay_cycles(2);
   LAT_CLK = 0;
   delay_cycles(2);
   for(frames = 0; frames < 66; frames++) {
      if(addr > MODULOS) {
         addr = 1; // Resetea direccion
      }
      delay_cycles(1);
      #asm
      MOV.B   #254, WREG // Envia cabezal
      CLR.B   0x2CD
      IOR.b   0x2CD
      #endasm
      LAT_CLK_OUT = 0;
      delay_us(1);
      LAT_CLK_OUT = 1;
      delay_us(1);
      #asm
      MOV.B   0xBFA, WREG // Envia direccion
      CLR.B   0x2CD
      IOR.b   0x2CD
      #endasm
      addr++; // Incrementa direccion
      LAT_CLK_OUT = 0;
      delay_us(1);
      LAT_CLK_OUT = 1;
      for(i = 0; i < 512; i++) {
         LAT_CLK_OUT = 1;
         LAT_CLK = 1;
         #asm
         MOV      0x2C2, W0
         SWAP.B   WREG
         #endasm
         LAT_CLK = 0;
         delay_cycles(2);
         LAT_CLK = 1;
         #asm
         IOR      0x2C2, WREG
         CLR.B   0x2CD
         IOR.b   0x2CD
         #endasm
         LAT_CLK = 0;
         LAT_CLK_OUT = 0;
         delay_cycles(4);
      }
      LAT_CLK_OUT = 1;
      for(i = 0; i < 17; i++) { // Ignora bits de checksum y stop bit
         LAT_CLK = 1;
         delay_cycles(2);
         LAT_CLK = 0;
         delay_cycles(2);
      }
      while(IN_D0) {// Espera start-bit en la linea de DATOS
         LAT_CLK = 1;
         delay_cycles(2);
         LAT_CLK = 0;
         delay_cycles(2);
      }
      LAT_CLK = 1;
      delay_cycles(2);
      LAT_CLK = 0;
      delay_cycles(2);
      for(i = 0; i < 512; i++) {
         LAT_CLK_OUT = 1;
         LAT_CLK = 1;
         #asm
         MOV      0x2C2, W0
         SWAP.B   WREG
         #endasm
         LAT_CLK = 0;
         delay_cycles(2);
         LAT_CLK = 1;
         #asm
         IOR      0x2C2, WREG
         CLR.B   0x2CD
         IOR.b   0x2CD
         #endasm
         LAT_CLK = 0;
         LAT_CLK_OUT = 0;
         delay_cycles(4);
      }
      LAT_CLK_OUT = 1;
      for(i = 0; i < 17; i++) { // Ignora bits de checksum y stop bit
         LAT_CLK = 1;
         delay_cycles(2);
         LAT_CLK = 0;
         delay_cycles(2);
      }
      while(IN_D0) {// Espera start-bit en la linea de DATOS
         LAT_CLK = 1;
         delay_cycles(2);
         LAT_CLK = 0;
         delay_cycles(2);
      }
      LAT_CLK = 1;
      delay_cycles(2);
      LAT_CLK = 0;
      delay_cycles(2);
      
      delay_ms(12);// PRUEBA
   }
   _cmd(12);
}

void main(void) {
   TRISA = 0x001F;
   TRISB = 0x0000;
   ODCA = 0;
   ODCB = 0;
   LATA = 0; // Pone a tierra las patas no utilizadas
   LATB = 0; // Pone a tierra las patas no utilizadas
   CN2PUE = 1; // Habilita Pull-UP en linea de datos
   CN3PUE = 1; // Habilita Pull-UP en linea de datos
   CN29PUE = 1; // Habilita Pull-UP en linea de datos
   CN30PUE = 1; // Habilita Pull-UP en linea de datos
   // Internal FRC, 80MHz
   PLLDIV = 0x0097; // Multiplica por 153
   CLKDIV = 0x0005; // PLLPRE divide por 7, PLLPOST multiplica por 2
   LAT_CLK_OUT = 1;
   _sd_init();
   LAT_LED = 1;
   for(;;) {
      _sd_multiread();
      delay_ms(500);
   }   
}


Last edited by metalm on Wed Feb 13, 2008 7:49 am; edited 1 time in total
metalm



Joined: 22 Mar 2007
Posts: 23
Location: Buenos Aires, Argentina

View user's profile Send private message

PostPosted: Wed Jan 30, 2008 5:21 am     Reply with quote

well, this appeared without tabs Sad
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Wed Jan 30, 2008 5:52 am     Reply with quote

You need to have "BBCode" enabled, if you want the code block to
appear. You can edit your post, and remove the check from the
tickbox for "Disable BBCode in this post".

You can also edit your profile and always enable BBcode. That's the
best way.
DDDDaniel
Guest







PostPosted: Wed Jan 30, 2008 10:19 am     Reply with quote

@metalm:
thanks for your input, but I can't change the hardware anymore and have to live with the standard SPI interface now... But I'll try to use the 4-wire interface in my next project Wink
Ttelmah
Guest







PostPosted: Wed Jan 30, 2008 10:53 am     Reply with quote

On your question about the transfer, using my 'minimised' code. If the SPI is not busy, as soon as you write a byte into the output buffer, it starts to transfer. Wait for the busy flag to drop, and read the input buffer. this will contain the byte read back.

Best Wishes
DDDDaniel
Guest







PostPosted: Wed Jan 30, 2008 2:50 pm     Reply with quote

Yeah, it works!
...but it does not really improve the performance... Sad
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Wed Jan 30, 2008 8:49 pm     Reply with quote

The optimization of SPI_write() doesn't help a lot because it is executed only a few times.

The code for reading the data is executed 512 times and this is where optimization would help.
Code:
   for (a=0;a<512;a++)
      *Buffer++ = spi_read(0xFF);
Roughly counting the instruction times in the listing file it looks like the loop takes about 35 instruction times, of which 8 are for transmitting the data over SPI.
A lot of code in this loop is used for calculating the address in memory where Buffer is pointing to. The CCS compiler is not very smart here as it calculates the address again and again in every iteration. Setting up the index register yourself and using the POSTINC register for reading the data you should get almost double speed. A nice feature of using the POSTINC register is that you nitialize the start address once and the PIC will increase the address for you on every access (there are also POSTDEC, PREINC and other register variations).

Code:
// Register defines for PIC18
// Place this at the global level, or in a separate include file with register definitions.
unsigned int16 FSR0;
#locate FSR0=0x0FE9

unsigned char POSTINC0;
#locate POSTINC0=0x0FEE


Replace the loop above with the code below:
Code:
   FSR0 = Buffer;            // Set start address in index register
   for (a=0;a<512;a++)
      POSTINC0 = spi_read(0xFF); // Write data to Buffer and increase address
DDDDaniel
Guest







PostPosted: Thu Jan 31, 2008 12:51 am     Reply with quote

Thanks, ckielstra,
your code really improves performance!
The new values:
Read SD: 312kB/s
Write SD: 216kB/s

And that's the current code:
Code:

.................... unsigned int mmc_read_sector (unsigned int32 addr,unsigned int *Buffer)
.................... {   
....................    // Setup Command sequence for Block Read
....................    unsigned int cmd[] = {0x51,0x00,0x00,0x00,0x00,0xFF}; 
*
0A31C:  MOVLW  51
0A31E:  MOVLB  9
0A320:  MOVWF  xF8
0A322:  CLRF   xF9
0A324:  CLRF   xFA
0A326:  CLRF   xFB
0A328:  CLRF   xFC
0A32A:  MOVLW  FF
0A32C:  MOVWF  xFD
....................    unsigned int res,b;   
....................    long a;
.................... 
....................    // insert address bytes into command sequence   
....................    addr = addr << 9; //addr = addr * 512
0A32E:  BCF    FD8.0
0A330:  MOVFF  9F4,9F5
0A334:  MOVFF  9F3,9F4
0A338:  MOVFF  9F2,9F3
0A33C:  CLRF   xF2
0A33E:  RLCF   xF3,F
0A340:  RLCF   xF4,F
0A342:  RLCF   xF5,F
....................    cmd[1] = make8(addr,3);
0A344:  MOVFF  9F5,9F9
....................    cmd[2] = make8(addr,2);
0A348:  MOVFF  9F4,9FA
....................    cmd[3] = make8(addr,1);
0A34C:  MOVFF  9F3,9FB
.................... 
....................    // chip select MMC/SD-card
....................    output_low(CS_CARD);
0A350:  BCF    F8F.1
.................... 
....................    // send 8 clock pulses
....................    //spi_write(0xFF);
....................    WRITE_SSP(0xFF);
0A352:  MOVLW  FF
0A354:  MOVWF  FC9
....................    WAIT_FOR_SSP();
0A356:  BTFSS  FC7.0
0A358:  BRA    A356
.................... 
....................    for (res = 0;res<0x06;res++)
0A35A:  CLRF   xFE
0A35C:  MOVF   xFE,W
0A35E:  SUBLW  05
0A360:  BNC   A37C
....................    {
....................       WRITE_SSP(cmd[res]);
0A362:  CLRF   03
0A364:  MOVF   xFE,W
0A366:  ADDLW  F8
0A368:  MOVWF  FE9
0A36A:  MOVLW  09
0A36C:  ADDWFC 03,W
0A36E:  MOVWF  FEA
0A370:  MOVFF  FEF,FC9
....................       WAIT_FOR_SSP();
0A374:  BTFSS  FC7.0
0A376:  BRA    A374
....................    }
0A378:  INCF   xFE,F
0A37A:  BRA    A35C
.................... 
.................... 
....................    // Wait for a valid response from the MMC/SD-card
....................    //while (spi_read(0xFF) == 0xff);   
....................    do
....................    {
....................    WRITE_SSP(0xFF);
0A37C:  MOVLW  FF
0A37E:  MOVWF  FC9
....................    WAIT_FOR_SSP();
0A380:  BTFSS  FC7.0
0A382:  BRA    A380
....................    }
....................    while(READ_SSP() == 0xFF);
0A384:  INCFSZ FC9,W
0A386:  BRA    A38A
0A388:  BRA    A37C
.................... 
....................    // Wait for Start Byte (FEh/Start Byte)
....................    //while (spi_read(0xFF) != 0xfe);
....................    do
....................    {
....................    WRITE_SSP(0xFF);
0A38A:  MOVLW  FF
0A38C:  MOVWF  FC9
....................    WAIT_FOR_SSP();
0A38E:  BTFSS  FC7.0
0A390:  BRA    A38E
....................    }
....................    while(READ_SSP() != 0xFE);
0A392:  MOVF   FC9,W
0A394:  SUBLW  FE
0A396:  BNZ   A38A
.................... 
.................... 
....................    // Read Sector (usually 512 Bytes) from MMC/SD-card
....................     
....................    FSR0 = Buffer;
0A398:  MOVFF  9F7,FEA
0A39C:  MOVFF  9F6,FE9
....................    for (a=0;a<512;a++)
0A3A0:  MOVLB  A
0A3A2:  CLRF   x01
0A3A4:  CLRF   x00
0A3A6:  MOVF   x01,W
0A3A8:  SUBLW  01
0A3AA:  BNC   A3C0
....................    {
....................      WRITE_SSP(0xFF);
0A3AC:  MOVLW  FF
0A3AE:  MOVWF  FC9
....................      WAIT_FOR_SSP();
0A3B0:  BTFSS  FC7.0
0A3B2:  BRA    A3B0
....................      POSTINC0 = READ_SSP();
0A3B4:  MOVFF  FC9,FEE
....................       //*Buffer++ = READ_SSP();
....................    }
0A3B8:  INCF   x00,F
0A3BA:  BTFSC  FD8.2
0A3BC:  INCF   x01,F
0A3BE:  BRA    A3A6
.................... 
....................    // Read CRC-Bytes
....................    // spi_read(0xff);
....................    // spi_read(0xff);
.................... 
....................    WRITE_SSP(0xFF);
0A3C0:  MOVLW  FF
0A3C2:  MOVWF  FC9
....................    WAIT_FOR_SSP();   
0A3C4:  BTFSS  FC7.0
0A3C6:  BRA    A3C4
....................    WRITE_SSP(0xFF);
0A3C8:  MOVLW  FF
0A3CA:  MOVWF  FC9
....................    WAIT_FOR_SSP();   // let's try to skip the wait cycle here
0A3CC:  BTFSS  FC7.0
0A3CE:  BRA    A3CC
.................... 
....................    // disable MMC/SD-card
....................    output_high(CS_CARD);
0A3D0:  BSF    F8F.1
.................... 
....................    return(0);
0A3D2:  MOVLW  00
0A3D4:  MOVWF  01
.................... }
0A3D6:  MOVLB  0
0A3D8:  RETLW  00



best regards,
daniel
Ttelmah
Guest







PostPosted: Thu Jan 31, 2008 6:12 am     Reply with quote

As one possible further change, consider modifying the main 'read' loop, with:
Code:

   FSR0 = Buffer;
   WRITE_SSP(0xFF);
   for (a=0;a<511;a++)
     {
     WAIT_FOR_SSP();
     POSTINC0 = READ_SSP();
     WRITE_SSP(0xFF);
     }
   WAIT_FOR_SSP();
   POSTINC0 = READ_SSP();

Looks 'daft' (adds three more instructions), but the key is that for 511 times round the loop, the SSP transfer,will actually occur, _while_ the loop count is being incremented, and tested, so far less time will be needed in the 'wait'. It ought to boost performance by aother few percent.
This is the whole 'key point' about using my 'split' functions, in that it potentially allows you to start sending the byte, and do other things before testing whether the transfer has finished.

Best Wishes
Guest








PostPosted: Thu Jan 31, 2008 8:13 am     Reply with quote

@Ttelmah:
Thanks, this might save me another few clocks!

Something else I recently tried was the following, to avoid having to deal with a 16 bit variable:
Code:

     unsigned int a;
     FSR0 = Buffer;
     WRITE_SSP(0xFF);
     for (a=0;a<255;a++)
     {
        WAIT_FOR_SSP();
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
        WAIT_FOR_SSP();
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
     }

But for some strange reason it does not work...
Do you know what I'm doing wrong and if it might increase the transfere rate furthermore?

thanks,
daniel
Ttelmah
Guest







PostPosted: Thu Jan 31, 2008 9:54 am     Reply with quote

I'd think this would be one transfer 'short'. You 'prestart' one 8bit transfer, then perform 254*2 transfers. Even if you add one 'post transfer' as in my code, you are one byte short.

Best Wishes
Neutone



Joined: 08 Sep 2003
Posts: 839
Location: Houston

View user's profile Send private message

PostPosted: Thu Jan 31, 2008 12:43 pm     Reply with quote

I think the fastest you can get will be something like this. Inline code could be slightly faster than the loop. The time to transfer one byte is 8 instruction cycles when SPI_CLK_DIV_4 is used. Using this code should give you just over one instruction time between byte reads. The first delay will have to be tweeked to insure 8 instruction cycles in processing the loop. I would guess this to be about a 30-40% speed boost compared to checking the GO_DONE bit.

Code:

     unsigned int a;
     FSR0 = Buffer;
     for (a=128;a>0;a--)
     {
        Delay_Cycles(2);
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
        Delay_Cycles(7);
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
        Delay_Cycles(7);
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
        Delay_Cycles(7);
        POSTINC0 = READ_SSP();
        WRITE_SSP(0xFF);
     }
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group