View previous topic :: View next topic |
Author |
Message |
newguy
Joined: 24 Jun 2004 Posts: 1907
|
Heads up: undocumented errata for dsPIC33FJ256GP710A |
Posted: Mon Jun 02, 2014 2:44 pm |
|
|
The current errata for this chip, item 11, details a bug whereby the TBE interrupt will fire before the UART has finished transmitting a character but only for UTXISEL = 01. Found out the hard way today that it also applies to UTXISEL = 00.
Heads up if you're working with this processor. All current HW revs are likely affected. |
|
|
jeremiah
Joined: 20 Jul 2010 Posts: 1343
|
|
Posted: Mon Jun 02, 2014 6:11 pm |
|
|
00 is actually for triggering when a character moves from the buffer to the shift register, so it intentionally fires before the transmission is finished. 01 is supposed to fire when the last character leaves the shift register and there are no characters in the buffer.
EDIT: that's not to say that it is not buggy. I have trouble working with it on my PIC24 chips too, but something else probably. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Mon Jun 02, 2014 8:19 pm |
|
|
Think it has something to do with the DMA channels I'm using to transmit the serial data. Took over a week to figure out. Problem was a transmitting device was failing to transmit entire packets (at random times). Not corrupted/missing characters within a packet - entire packets. Uncovered the fault during stress testing. Out of 4000 test packets, with no delay between them, anywhere from 9 - 35 would disappear. Given that each device reported no malformed/bad packets, the mystery deepened.
As soon as I added an extra test to see if the transmit buffer was empty/ready for a packet before loading the DMA buffer and starting the transmission of the packet, we dropped to 0 packet loss. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19488
|
|
Posted: Tue Jun 03, 2014 12:05 am |
|
|
Thanks for the info.
Have you actually reported it to MicroChip?.
Thinking about it though, it sounds as if this may be your fault!.....
The way DMA is done, is that the TX interrupt is set to trigger when there is _one_ character of space. The DMA controller then loads the next character automatically, triggered by this interrupt.
Then you only load a new buffer, when the DMA interrupt triggers (not the TX interrupt). The DMA interrupt fires when the TX interrupt has fired, and the DMA controller has no more data, which then implies there is space to send another packet.
You don't physically use a UART transmit interrupt handler when using DMA.
Instead you load the buffer when INT_DMAx fires.
Best Wishes |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Tue Jun 03, 2014 1:11 am |
|
|
Yup, that's how I am doing it. DMA buffer is loaded when a packet is ready, then DMA takes care of sending the entire packet. DMA interrupt then fires and the process is able repeat if another packet is available. In order to get DMA to work in this manner, it must be linked to the TBE interrupt, though the TBE interrupt isn't enabled or used.
Works flawlessly more than 99% of the time but occasionally would skip an entire packet that had been transferred to DMA but the DMA interrupt would prematurely fire before the UART had actually sent said packet. This all went away as soon as I inserted the test for the buffer empty before loading the DMA and kick starting the automatic transmission of the packet. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Wed Jun 11, 2014 7:19 am |
|
|
Update: Microchip has replicated the issue. Not sure what they're going to do now. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19488
|
|
Posted: Wed Jun 11, 2014 9:14 am |
|
|
Well the first thing that will happen, is that they may find a better 'bodge', or be able to document the particular conditions that trigger it. These and it will hopefully then appear in a new erratum.
Hopefully 'longer term' there will then be a fixed chip. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Wed Jun 11, 2014 10:13 am |
|
|
Their suggestion was actually to switch to UTXISEL = 01, presumably just to see what would happen. Can't do that now, as the FW is now in production, and it may be some time before I can test their suggestion. What I have now is code that doesn't drop packets, and that's truthfully all I really want. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Thu Jun 12, 2014 8:39 am |
|
|
Some more from Microchip regarding the issue:
Quote: | I have tested the TRMT work-around and it should work based on my results. This situation would arise on one-shot successive back to back transfers with the DMA-UART when the last character is still in the process of transmission and another DMA-UART process is activated in quick succession. Please see attachment for this. Regarding the UTXISEL = 01, this can be only tried for testing but has to be verified before actual application as the silicon erratta would also point to this. Note that I'm currently testing in byte transfers for this. |
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19488
|
|
Posted: Fri Jun 13, 2014 12:02 am |
|
|
Reading what they have said, and the fix, it appears what triggers it, is if you have an 'unfinished' DMA transaction (data still being sent), and you then launch another one.
You might well be able to avoid this, by using a flag in the INT_DMA handler, which sets when there is no more data to send. Then when starting a transaction, if this is _not_ set, waiting for the previous transaction to end. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Fri Jun 13, 2014 7:20 am |
|
|
I got that too, but it still essentially comes down to the TBE interrupt firing prematurely.
I already have a flag, "transmit_in_progress", that gets set before the DMA is set up to "blast" a packet. A variable, "number of messages transmitted" then gets incremented. The packet is then transferred to DMA memory, and the DMA is set to do a one shot automatic "blast" to the appropriate UART, which includes the number of transfers that the DMA has to perform in order to transfer the entire packet. The UART is finally "kick started" to start the procedure.
When the DMA finishes transferring the last character from DMA memory to the UART, the DMA interrupt fires. In this interrupt, a one-shot timer is configured to expire in approximately 1.5x a character period. That timer interrupt kills the timer and finally sets "transmit_in_progress" to false. At this point, the mainline code is free, if the number of messages that are to be transmitted does not equal the number of messages transmitted, to start the process anew.
What I was seeing was entire packets (not portions of packets) going missing. Given the complexity of the "dance" that the DMA and UART are involved in, I still have trouble seeing how I can get an entire packet to go missing. Never missing characters within a packet, only whole packets (random whole packets at that). |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19488
|
|
Posted: Fri Jun 13, 2014 9:08 am |
|
|
TBE triggers when the UART buffer is empty. What this 'means' depends on the configuration, in your case it should be when the buffer can accept a new block of data from the DMA. So the 'buffer is empty'.
Now INT_DMA triggers when the DMA transfer is complete, which in your case means the last packet has transferred to the UART buffer.
The point is that this doesn't mean that the actual transaction has completed. There are still the characters in the UART buffer left to send.
The problem then appears if you try to start a new packet, before these have sent. Your timer needs to be set to the number_of _characters_in_UART*byte_time, if it is to trigger when the transaction completes. Or the code to start a new transaction, needs to wait until there are no characters in the buffer (which is your current solution).
Because the buffer can't take the data, the new DMA transaction doesn't start, so the whole packet gets lost. |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1907
|
|
Posted: Fri Jun 13, 2014 10:50 am |
|
|
The issue is that the UART buffer is only a few characters deep and my minimum packet size is about 2x the UART buffer size. I know that the DMA interrupt fires when it's done transferring its set number of characters to the UART's transmit buffer, but what I can't wrap my head around is that an entire packet - which at minimum is 2x the UART's buffer size - goes missing.
I have code that will alert me to bad packets. General packet structure is
START_CHAR
NUM_BYTES_IN_PACKET
[packet data - but START_CHAR and STOP_CHAR are not allowed mid-packet]
LUT based XORed checksum
STOP_CHAR
I can detect any malformed packets and the weird part is that I'm not seeing any on the receive side device. None. What I am seeing is 100% correctly formed packets (no bit/byte errors), but 0.25 - 0.5% loss of entire packets.
What has to be happening is some sort of race condition whereby a timing induced TBE and DMA interrupt relative timing thing occurs and the DMA, in machine gun fashion, thinks that it has transferred what it was supposed to transfer to the UART when it actually hasn't. ....It's like the DMA interrupt is prematurely firing.
And the inter-packet delay added through the use of the one-shot timer fired inside the DMA interrupt actually doesn't make a difference in the behaviour. I actually added that because an SBC that is attached to one of the UARTs had difficulty distinguishing one packet from another. I added the same delay to the other UART (which is attached to an identical device/board with the same dsPIC) in an effort to see if it had any effect on the dropped packets. I'm leaving it in because I finally have a 0% packet loss on something that has been kicking my butt for a few months and I really don't want to jinx it by monkeying around.
The rather ironic aspect to this is that I originally had rather "traditional" UART transmit routines (via the TBE interrupt) and I couldn't find out why I was dropping characters. I could see noise being an issue given the high baud rate (500k and 417kBAUD on the two UARTs), but my gut told me I shouldn't have been dropping as many as I was actually seeing. I had my UTXISEL set to 00, as I still do, but the errata said that only 01 was affected so I didn't pursue the errata's workaround. So, in an effort to come up with a more elegant solution (and free up processor time by not continually interrupting every time the TBE interrupt fires) I migrated the transmit code to the DMA. The problem followed, but changed in that now entire packets went missing, not individual characters. That's basically why I kept digging. |
|
|
|