View previous topic :: View next topic |
Author |
Message |
JAM2014
Joined: 24 Apr 2014 Posts: 138
|
Troubleshooting suggestions... |
Posted: Wed Apr 15, 2020 12:19 pm |
|
|
Hi All,
I have a lighting controller design based on the 18F46K22 device. It's basically a clock using the DS3231 RTC, a GPS module, and a 16x2 LCD display. The code calculates sunrise/sunset times to send On/Off lighting data via an Xbee module to remote lighting modules. The RTC time is sync'd to the GPS once per day at 0 hours UTC. The GPS sends data to the PIC at 1 Hz.
The project works great - now the big 'But' - however, it will occasionally lock up and stop running at seemingly random times. I have no idea the mechanics of 'why' it's locking up, and it happens so infrequent - possibly less than 1 time per month - that I'm not even sure how I'd troubleshoot it?
There is one area that I suspect could be done better. In my code, I constantly read the time from the RTC, and only update the LCD when the time has changed. This potentially involves a lot of processor activity that may not be necessary. I know that some folks generate a 1 Hz interrupt from the RTC, and update the display whenever this occurs. I'd implement this method if the technique I'm currently using *may* be the source of my issue.
I could also reduce the frequency that the GPS is sending data. The GPS input uses a circular buffer, but maybe I'm just overtaxing the processor between processing the incoming GPS messages & reading the RTC?
Lastly, I could add a 'band-aid' and implement a WDT to reset the code in the event that the system hangs up.
I realize this is a bit of a vague request, but I'm just looking for possible avenues to improve my code and solve this occasional lock-up problem.
Running the processor on the internal 8MHz oscillator.
Compiler is PCH 5.050
Thanks,
Jack |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1908
|
|
Posted: Wed Apr 15, 2020 2:30 pm |
|
|
Off the top of my head...
- Your RTC is I2C; if there's a hardware issue with the I2C bus, it's possible that the processor can hang in an infinite loop waiting for the bus/RTC to respond. Before each I2C transaction, "launch" a spare timer set to expire/interrupt 2x or 3x your expected maximum total transaction time. At the end of your transaction (successful interaction with the RTC), kill the timer. In the timer interrupt, kill the MSSP and do a hard reset of the RTC.
- If your RTC has a 1 pulse per second output line, use that instead to increment your clock. Only update the LCD when this happens.
- Ensure your serial code (interface to the GPS) is absolutely rock solid. Would a corrupted character throw that off and somehow cause it to lock up? Could you implement some kind of escape from the processor sitting in an infinite loop waiting on a certain character from the GPS - which might not ever arrive or if it does, the processor won't respond to because of an unforeseen path through your code? Usually with a GPS a standard $GPxx string starts with a '$' and ends with a CR LF. Launch a timer when you get a '$', and kill it when you get the CR LF. If the timer expires, have it reset your serial decode logic.
- GPS receivers will usually have a 1 pulse per second output. As with the RTC, use that to update your internal time instead of constantly reading the string the GPS sends. Actually, I'd preferentially use the GPS' 1PPS as that will be far more accurate than your external RTC chip. Only revert to its 1PPS if the GPS hangs or if the number of satellites it sees drops below a set number (which is usually reflected in a flag character in most $GPxx strings - i.e. fix valid). |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9229 Location: Greensville,Ontario
|
|
Posted: Wed Apr 15, 2020 2:59 pm |
|
|
I have to say the DS3231 is rock stable..crazy for a $2 module that also has EEPROM.... I use it's 1HZ ISR for 'main()' updating of LCD, tests, etc.
How good is the power supply ? GPS and Xbee modules can take a LOT of power (current), even though it's a short time. While it may be overkill, X5 or even X10 the 'max' COMBINED current ensures a STABLE power supply.
Is it 5 volt or 3.3 volt ? I2C needs correct pullups, typically 4k7 for 5V, 3k3 for 3v3. Any chance EMI is getting in ?? Are both VDD and VSS pins connected ? Add some .1 mfd bypass caps locally to all devices. Having a random failure is tough to isolate, usually it's a process of elimination. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19520
|
|
Posted: Thu Apr 16, 2020 12:54 am |
|
|
I have to agree though. It is much 'better' to actually handle the 'time'
in the PIC. Just program the RTC to give a 1/sec interrupt and have the
PIC update an int32 'timer' value every time this is received.
Result keeping 'time' only involves a few uSec, even with a slow PIC.
Then on 'boot' reload this 'timer' value from the RTC.
The time.c/h library has the functions to convert a 'time' to and from a
simple integer value like this.
I usually just handle 'time' like this, but perhaps at midnight, reload from
the RTC 'just in case'.
As Newguy says, if your serial stream is coming from the GPS, you can
use the arrival of this as your 'time' signal rather than using the RTC.
Now in general, the I2C shouldn't be affected by the PIC having to handle
the serial from the GPS (I am assuming you are using interrupt driven
serial with a large buffer). The possibility does exist though that if the
chip is busy handling the I2C, the buffer might get fuller than normal,
while the main code can't actually do the interpretation. So expanding
the buffer size is a good place to start. How does your serial code cope
if the buffer gets full?.
Is it possible that in fact the GPS is returning an 'ERROR' status of some
sort, and that your interpreter to handle the incoming string does not
cope with this?. It is quite common to get errors when a plane flies
overhead or a large truck goes by the building. If you are interpreting
a NMEA GPS position string the 'FIX QUALITY' field might be changing
to 0. Also does your interpreter check the checksum before reading
the data?. A simple intermittent serial error (perhaps when something
nearby turns on/off), could imply the value should not be used. |
|
|
JAM2014
Joined: 24 Apr 2014 Posts: 138
|
|
Posted: Mon Apr 20, 2020 8:53 am |
|
|
Hello All,
Thanks for the 'food for thought' on this issue!
Individual 'components' (hardware & software) for this project have been in a number of other 'mature' products, so I'm reasonably sure that the individual pieces are robust. It's the sum of the pieces that has been the problem. I just completed a rewrite of the code to enable and use the 1Hz output from the DS3231. This in itself will dramatically reduce the overhead of reading the RTC. I've also increased the size of the GPS receive buffer. We'll see if these changes make any difference.
I also really like the idea of using the RTC for the 1 second 'tick' and just incrementing an int32 variable to manage time. This makes sense as I'm converting to Unix time already - all my timekeeping 'internally' is UTC time - applying a local 'offset', and then converting back to regular time. Keeping a running counter with the time would save a number of these steps!
Thanks all for the suggestions! I'll report back in a month or two with my results!
Jack |
|
|
JAM2014
Joined: 24 Apr 2014 Posts: 138
|
|
Posted: Thu Jun 11, 2020 11:17 am |
|
|
Hi All,
I've been doing a bunch of troubleshooting to figure out why my lighting controller project will randomly lock up. After performing a lot of testing to rule out certain functions of the code (reading the RTC, reading the GPS, etc.), I've determined that the code seems to always die inside the GPS CRC calculation routine shown below. This code takes a null-terminated GPS NMEA message, and calculates the CRC. 99.9% of the time this approach seems to work fine, but occasionally it does not!
Code: |
char calculate_checksum(void)
{
// This subroutine calculates the NMEA checksum by XORing all the NMEA sentence characters together exclusive of the '$'
// and '*' characters.
int8 i;
int8 value = 0;
for(i = 0; Msg_Buffer[i] != '*'; i++)
{
if (Msg_Buffer[i] != '$')
{
value ^= Msg_Buffer[i];
}
}
//fprintf(LCD, "CRC: %u\r\n", value);
return(value);
}
|
My thinking is that the NMEA message is somehow corrupted on occasion, for example the '*' character is missing, and this causes the routine to go off the rails. Any thoughts on potential deficiencies of this routine, and how to make it more 'bomb proof'?
Jack |
|
|
bkamen
Joined: 07 Jan 2004 Posts: 1615 Location: Central Illinois, USA
|
|
Posted: Thu Jun 11, 2020 12:01 pm |
|
|
Oooo.
You definitely want to check to see if you have the char OR you've hit the end of the buffer and exit. (like return a failure code and not use the GPS string)
Without the safety of the "end of buffer" -- you go into la-la land.
ALSO,
How is your program getting the GPS string? ISR? READ?
Is there buffer overflow protection there too? _________________ Dazed and confused? I don't think so. Just "plain lost" will do. :D |
|
|
PrinceNai
Joined: 31 Oct 2016 Posts: 480 Location: Montenegro
|
|
Posted: Thu Jun 11, 2020 12:10 pm |
|
|
I'd definitely do what bkamen said. Check if i goes too far and if that happens, exit with an error code. And first check if your buffer is actually big enough to hold the longest expected string. Even though if you are using a circular buffer for GSM data that would just mean you'd hit "*" way too soon and thus getting a wrong CRC. |
|
|
JAM2014
Joined: 24 Apr 2014 Posts: 138
|
|
Posted: Thu Jun 11, 2020 3:38 pm |
|
|
Hi All,
I'm not saying this is the 'be all, end all', but I've added the following NMEA $GPRMC error checking into the CRC routine:
Code: |
//Here we test for a corrupt NMEA $GPRMC string.....
//A proper string will be less than 78 characters in length, will start with a '$', and will have a '*' four characters removed from the end...
if((strlen(Msg_Buffer) > 78) || (Msg_Buffer[0] != '$') || (Msg_Buffer[strlen(Msg_Buffer) - 4] != '*')){
fprintf(Diag, "Corrupt NMEA string!\n\r");
Return(0);
}
|
So far, things seem more stable, but I'll need to test a while before declaring victory!
Jack |
|
|
PrinceNai
Joined: 31 Oct 2016 Posts: 480 Location: Montenegro
|
|
Posted: Fri Jun 12, 2020 4:36 am |
|
|
If corrupt string is indeed your problem I'd print it out also, along with the warning. To see the guilty one :-). But you most likely do that somewhere else in your program. |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9229 Location: Greensville,Ontario
|
|
Posted: Fri Jun 12, 2020 1:53 pm |
|
|
Ok, I'm curious.....I have to ask why are you sending GPS data to the PIC every second ?
I can see at 'powerup', getting the GPS data (lat + long) to calculate sunrise/sunset ,after that...... why would you need any GPS data ?
Jay |
|
|
JAM2014
Joined: 24 Apr 2014 Posts: 138
|
|
Posted: Mon Jun 15, 2020 7:25 am |
|
|
Hi All,
As a follow-up, the changes I've made to the CRC check routine seem to have solved my problem! It's only been about 4 days now, but I've seen no lockups in continuous operation, or during many reboots. I'm not 100% ready to declare 'success', but it's starting to look very promising!
This is a lighting controller intended for long term unattended operation. There is a GPS module connected to the PIC to allow the controller to 'know' its position for sunset/sunrise calculations (this only needs to be done once!), but also to 'sync' the onboard real time clock at midnight UTC each day.
Typically, a GPS module will send a 'message' to its host every 1 second. This is configurable, but basically these type modules will be transmitting data on a continuous basis. I could disable the GPS when it's not actually needed, but it's probably not worth the effort.
Jack |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19520
|
|
Posted: Mon Jun 15, 2020 7:34 am |
|
|
Seriously, just set the GPS up to send the string once when requested.
The standard settings for GPS's have the option to enable auto transmission,
or send a single position on request. The latter is a much nicer option for
your job. Look in the Query/rate control section of the reference for your
GPS.
Then just send a request when you want the data... |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9229 Location: Greensville,Ontario
|
|
Posted: Mon Jun 15, 2020 10:17 am |
|
|
gee ,since the GPS data is only needed once per day, you'ld save 86,399 operations a day by polling the GPS once instead of every second or every minute of every hour.
Over the past 2-3 years I've used the DS3231 for my greenhouse controllers and maybe reset the time 1 or 2 times... |
|
|
|