Possible Baud Rate Tolerance Issue

MotoDan · Joined: 30 Dec 2011 Posts: 55

I'm working on a first production run problem on a pair of boards (PIC16F1516) that communicate over a 6 ft cable at 19.2kb. Both are using the internal osc running at 2 MHz.

According to the datasheet, the baud rate error at this frequency calculates to be 0.16%. The trouble we're having is that a small number of board pairs will occasionally fail due to data reception errors. The data for both PICs decodes correctly on my scope - even when the transmission fails. This could just be due to the scope's wider acceptance of baud rate tolerance errors.

The PICs are all programmed by a 3rd party. I don't have any reason to believe that they would be altering OSCAL so I assume the internal oscillators are within the +/- 8% @ 5V. I'm also not sure if the two PICs happened to be running at their opposite worst-case frequencies would cause the uarts to fail to receive reliably.

The communication between the PICs is a simple ASCII fprintf/fgets affair consisting of 8 bytes (master) with a 3 byte slave reply. No CRC or other error checking is currently being used. I've looked at the failing data using a second (software) uart and PC connection and the value that is failing does not appear to be random like first suspected. Instead, the failing value is always the same.

I've noticed that the problem disappears when I change Fosc to 4 MHz on one of the PICs. The baud rate error at 4 MHz is the same 0.16% so on the surface it doesn't look like I'm improving anything by increasing Fosc.

I've tried heating up the PICs to see if I can change the failure rate, but so far, temperature doesn't seem to have an affect. I have also made some timing measurements on the uart data from both PICs which looks to be reasonably close to the target 19.2kb. I have rs232 'errors' enabled, but am not currently testing for framing errors, etc.

I'm looking for any suggestions on how to verify that the failing data is actually due to a baud rate issue. My main concern is that the fix I end up with will be reliable and not just show up with another set of boards which are slightly different in a way (Fosc, etc) which causes the failure to reappear. My suspicion is that the 2 MHz to 4 MHz change is solving the problem on the set of boards that I am troubleshooting, but may not work on another set of boards with oscillators running a slightly different frequencies.

Mike Walne · Joined: 19 Feb 2004 Posts: 1785 Location: Boston Spa UK

You don't tell us much about what else you're doing, so we're second guessing.

However, I encountered a similar problem many moons ago.
Two boards were communicating down a single wire.
So both boards had to turn round from TX to RX and vice versa.

The error rates were of the order of 1% so quite rare but not acceptable.
To diagnose the problem I looked at what data each board thought it was getting, rather than what my scope and other test gear saw.
What came out from the analysis was that the errors were being caused by a corruption of the leading edge of the first byte, rather than a baud-rate issue.

It's possible that your problem is being caused by some part of one board not reacting quickly enough to the incoming data.
Moving from 2MHz to 4MHz is allowing for a faster response and thus masking things.
Try experimenting with different baud & clock rates.
If your system is robust enough it should be able to tolerate timing errors of several % at both ends.

Mike

MotoDan · Joined: 30 Dec 2011 Posts: 55

Great information Mike. I failed to mention that these boards also communicate over a 1-wire connection. There is ample (5 ms) between master/slave bus activity. You bring up a very interesting point about the start of data corruption. I'll have to take a look at what the PIC is receiving to see if I can tell if this is what's happening or not.

Thanks for the reply!

newguy · Joined: 24 Jun 2004 Posts: 1911

Can you add a small deadtime between each transmitted packet? If you already have such a thing, how about a small deadtime between each transmitted character?

Another trick you could try is to take a page from simple RF transmissions, and preface all data transmissions with a series of superfluous synchronization bytes. Instead of master sending 8 bytes to slave, send 8 + x, where the first x bytes are your unique sync bytes. On the slave, alter your data reception routine/logic to automatically discard any number of the sync bytes and instead focus on the "start of transmission" character instead, then continue as you have been. Similarly add slave sync bytes to what the slave transmits. The assumption here is that the UART of either master or slave isn't properly delineating each transmitted character....the UART may think it saw a start of character when the other UART is actually in the middle of a character.

PCM programmer · Joined: 06 Sep 2003 Posts: 21708

temtronic · Posted: Wed Nov 19, 2014 5:36 pm

Another area to look into is the power supply for the PICs. My rule of thumb is to have a PSU good for at least 5X the max you 'think' the PCB will draw AND be sure to have proper filtering. ANY 'unguarded' pin could easily let a 'glitch' or 'gremlin' in causing no end of grief and hair pulling. Also be sure you don't have any unused pins 'floating', use a pull resistor to ensure a good high.
Sounds like you've gone the 'economical' route of no xtal and 2 caps and I understand the penny saving idea but....it'd be interesting if your problem 'goes away' if you added them back in.
I know 6 feet of wire isn't a lot BUT it can easily become an antenna, pick up some EMI (cell phone, wireless modem, etc.) and the 'fun' begins.
Be sure to have a GOOD ground between the two PCBs.
I know most of this you've probably thought of but sometimes it's the little detail that you miss comes back to bite you.

good luck
report back with what you find.

Jay

gpsmikey · Joined: 16 Nov 2010 Posts: 588 Location: Kirkland, WA

Don't forget that according to Murphy's law, tolerances will add up in the worst way possible (so one will be on the high end of the spec and the other will be on the low end of the spec). See if the ones that are giving errors are also giving errors if they are paired with a different board.

mikey
_________________
mikey
-- you can't have too many gadgets or too much disk space !
old engineering saying: 1+1 = 3 for sufficiently large values of 1 or small values of 3

RF_Developer · Joined: 07 Feb 2011 Posts: 839

Ttelmah · Joined: 11 Mar 2010 Posts: 19589

I wonder about one thing. Still timing related.
This PIC says it uses a clock at 16* the baud rate, for it's internal UART sampling. At 4MHz, this can be generated directly by division from the oscillator, but at 2Mhz, this frequency is not achievable directly. So it'll probably have to generate a half cycle division. Depending on the symmetry of the clock, this can produce yet another slight timing error. Add this to the tolerance between the oscillators (already potentially out of spec), and things are getting worse....
Given that one doesn't actually 'care' about the real baud rate (no other devices than the PICs involved), why not think of some way of knowing if the packet has failed, and if it does, enable the ABDEN bit in the USART, and let it recalculate the best fit timing?.

Mike Walne · Joined: 19 Feb 2004 Posts: 1785 Location: Boston Spa UK

There are still too many loose ends here you're not telling us about.

How are the PICs connected, directly, via ttl buffer, ttl to RS232 converters, ttl to 485 converters, whatever?
How are you achieving the turn round?
How are you measuring the actual baud/oscillator rates?
What is the accuracy of your measurements?

Mike

temtronic · Posted: Thu Nov 20, 2014 6:31 am

Just looked at the datasheet and have a few questions

1) Why 4MHz operation when it can do 16MHz? Usually most programmers go for the max.
1a) If you try at the higher clock does the problem still exist ?

2) PIC has 'autobaud' capacity. Is that turned on? Confirm by the listing and see the config bits, never assume the 'defaults' are right!

3) Any chance the PIC is in 'sleep' mode? Potentially a problem..'lazy PIC itis'.

4) How is your 'one wire' hardware and software done. If the pullup on the 'bus' is weak,signal will be 'lazy' or slow. If you show us your 'one wire' code, someone might see ' there's the problem'...

5) WDT enabled ? Maybe a red herring but be nice to know.

That PIC has a lot of nice features but since the problem only appears to be a few boards maybe they're not clean?

Please confirm, you ARE using the hardware UART and running a good 5 volt power supply.

hth
jay

Ttelmah · Joined: 11 Mar 2010 Posts: 19589

Just a comment, the AutoBaud on that chip is not something you want on, _unless_ you are adjusting the rate. The CCS code will not turn it on.

It works by switching the internal clocks, so it 'count' the next incoming character, and works out what count value should be loaded in the BRG to give the best result. You have to set it, have a character sent by the other end, then read the time recorded, subtract one, and write it back (it records the division count required, but the BRG divides by value+1).

It definately would be the way to adjust the BRG to get the best results, though it too would like a higher clock rate (at the existing divisor of /26, the adjustments would be /25 or /27 -> nearly 4%, while at a higher clock the adjustment becomes finer.

MotoDan · Joined: 30 Dec 2011 Posts: 55

Thanks for all of the insightfuil input. Really appreciate everyone's efforts.

Here's a little more info on what's going on. Just got another set of boards from the mfgr in China and this time the UART output from the Master unit is not correct. The Remote unit is not responding as before, but now the UART decoder on my Rigol scope is not seeing the correct bytes.

Suspecting an error in Fosc, I decided to skew the '#use delay' statement to see if it had an effect in the UART output. I first went to 1.9 MHZ which made the decoding worse. I then went to 2.1 MHZ which fixed the problem. The scope is decoding the correct data and more importantly, the Remote is too.

To the question about whether I'm using the hardware UART - yes. UART1 is specified in the #use RS232 statement.

I'm assuming the reason that increasing the delay statement to 2.1 MHZ is because the compiler is adjusting the baud rate generator values which then allows the UART to transmit/receive at the correct rate of 19.2kb.

Another question was related to why 2 MHz instead of 16 MHz. The reason I'm using 2 MHz is to reduce current consumption as this is a battery-powered device. Also, to your question about a clean supply, these devices are running at 5V via a switching supply. The output was measured at 5V with about 25 mV ripple so it's pretty clean. The switcher also has ample current to run this circuitry.

These boards use the CLKOUT which produces Fosc/4 at an output pin. I changed the internal osc to 16 MHz (for this test) and measured the frequency with an accurate counter to be 4.15 MHz which correlates to an Fosc of 16.6 MHz. This is only about 4% high which is within the +/- 6.5% (over temp) spec. These boards are all at room temp so I would think the factory calibration would be much closer than this.

So at this point my assumption is that the internal 16 MHz osc is in tolerance, but slightly high. This makes me wonder if perhaps the factory is using either PIC knock-offs. The other possibility might be that the factory calibration of the internal osc has somehow been altered. Either way, there is no way I can achieve the +/- 2.5% baud rate tolerance (that others have cited) when the Fosc is 4% high.

The possible solutions that I'm coming up with are: 1) switch to an ext osc with a frequency - preferably one that is a multiple of the 19.2kb rate, 2) perform a software calibration based on Fosc/4 which corrects the BRG values, or 3) select a custom baud rate that is a 1:1 multiple at 2 MHz. The latter wouldn't make much difference since the 19.2 kB is only 0.16% off at 2 MHz.

PCM programmer · Joined: 06 Sep 2003 Posts: 21708

temtronic · Posted: Wed Dec 03, 2014 8:30 pm

Sorry you've found out the hard way that the int osc isn't too good for serial communications(BTDT). It's also not good for other 'timer' related operations.
Any chance you can add a xtal/2caps AND use a bigger battery ? Using the xtal will give you rock solid communications, and well, a bigger battery isn't that much money or size these days. Maybe add a supercap to reduce peak demands?
Another possible thing to try is to reduce from 19k2 down to say 9600. Slower speeds are more 'forgiving'. Heck I run at 24 Baud (yes, 24 bits per second) and never have 'issues'.

Not too sure what else to suggest...

Jay