|
|
View previous topic :: View next topic |
Author |
Message |
Franck26
Joined: 29 Dec 2007 Posts: 122 Location: Ireland
|
Timer 3 IT does not trip |
Posted: Wed Jul 14, 2010 9:57 am |
|
|
Hello,
I've got a bug on a 18F2520 that I'm fighting with for few months now.
I've already ask for some help on this post (no need to read it):
http://www.ccsinfo.com/forum/viewtopic.php?t=42279&highlight=franck
At the time I thought that my problem was linked to the GIE bit, but it looks like it was not that.
I use the timer 3 overflow to generate an interrupt.
It use the internal clock (FOSC/4), the oscillator is the internal RC at 8MHZ.
Everything works fine for a while. But after 1 day, 1 week or sometimes 1 month, the interrupt stopped being triggered and I have to reset the system to get it to work again. Everything else seems to work.
I've implemented a trap in my software to check the registers that I thought would be the problem.
I'm monitoring: INTCON, RCON, T3CON, and PIE2.
Below are the bits that I think are linked with the timer 3 interrupt.
INTCON:
GIE/GIEH = 1 => Enables all unmasked interrupts
PEIE/GIEL = 1 => Enables all unmasked peripheral interrupts
RCON:
IPEN = 0 => Disable priority levels on interrupts
T3CON:
RD16 = 1 => Enables register read/write of Timer3 in one 16-bit operation
T3CCP<2:1> = 00 => Timer1 is the capture/compare clock source for the CCP modules
T3CKPS<1:0> = 00 => 1:1 Prescale value
T3SYNC = 0 => This bit is ignored. Timer3 uses the internal clock when TMR3CS = 0.
TMR3CS = 0 => Internal clock (FOSC/4)
TMR3ON = 1 => Enables Timer3
PIE2:
TMR3IE = 1 => TMR3 overflow interrupt enabled.
All those register bits seems fine, in fact they don't change from when it works and when the bug is acting.
Is there any other registers that I should check?
Thanks for any help,
Franck.
Note: I've posted the same message on the microchip website. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Wed Jul 14, 2010 4:42 pm |
|
|
You know what I'm going to say. Post a very small test program
that is compilable, that show the problem. Something like 10-20 lines.
How do you know the test program is failing ? Explain.
Also post the following:
1. Compiler version.
2. What is the Vdd voltage ? Also describe the voltage regulator circuit.
3. Do you have bypass capacitors on the Vdd pins? What type & value ?
3. Describe the external MCLR circuit, or are you using the NOMCLR fuse ?
4. Post anything unusual about the test environment, such as the
temperature, electrical noise, static discharge, etc. |
|
|
Franck26
Joined: 29 Dec 2007 Posts: 122 Location: Ireland
|
|
Posted: Thu Jul 15, 2010 2:56 am |
|
|
Hi PCM,
Thanks for your answers.
PCM programmer wrote: | You know what I'm going to say. |
Yes, it was obvious
Lets start by the easy answer:
1- Compiler version: divers compiler versions, the code is 1 year old and I have been recompiling it several times with the last available compilers (excepted the 4.108 and 4.109 due to a compiler bug...). The bug is happening for any compiler version that I have tried.
2- Vdd = 5V, Buck converters from 24V. The VDD is filtered and very clean when checked via oscillo.
3- 100nF ceramic capacitor very close to the VDD pin. The VDD pin is 30mm away from the switch converters tantalum output capacitor. This is a small 4 Layer PCB with internal power plans.
3- The MCLR circuit is just a 47k pull-up resistor. No switch, there is no manual reset. I need to check what the NOMCLR fuse is, but I don't use it in this code.
4- The system is used in industrial environment (big motors, etc...). I have 1 system in the lab (no noise). It doesn't look like the system bugs more often in industrial environment. There is a fan on the other side of the PCB, this is a low power fan and it doesn't use the 5V of the micro.
Now the more complicated answers:
PCM programmer wrote: | Post a very small test program that is compilable, that show the problem. Something like 10-20 lines. |
The memory of the 18F2520 is nearly full I don't have any clue on which part of the code is causing this bug and the bug happens rarely.
It would take me a life time (or a lot of luck) to break the code in small part and check if the bug happens...
PCM programmer wrote: | How do you know the test program is failing ? Explain. |
The timer 3 is used to synchronized a RS485 communication bus. The overflow trig the IT every 0.521ms, 1.04ms or 1.56ms (depending on the RS485 tram). When the bug happens, the communication stop, no answer from the module. To find out what the problem is, I have added a line of code in the the timer 3 IT to toggle a pin. The pin toggle when I start the software and stop toggling when the bug happens.
Maybe something interesting: There are several identical modules on the same RS485 bus. This is always the same module which bugs (on all the systems). The only difference that I can see between this module and the others is that it use the ECCP and the timer 2 to generate a PWM. I have set-up a new test where I use another module to generate the PWM and I'm going to monitor the CCP1CON register (PWM). Maybe the bug stays with the module which generate the PWM... I'll get the answer in maybe 1 month !!!
I'm still looking at the errata, but I haven't find anything which could cause this bug...
Thanks for your help,
Franck. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Thu Jul 15, 2010 1:33 pm |
|
|
Quote: |
The timer 3 is used to synchronized a RS485 communication bus.
The overflow trig the IT every 0.521ms, 1.04ms or 1.56ms
Everything works fine for a while. But after 1 day, 1 week or sometimes
1 month, the interrupt stopped being triggered and I have to reset the
system to get it to work again.
|
A few initial questions:
1. Does it fail in the lab ?
2. How often does it fail ? For example, does it fail once every 24 hours
of continuous use ?
3. Does it fail if all external equipment is disconnected ? (such as
RS-485 cables, etc).
4. Questions on Timer3:
Does the Timer3 interrupt always run at a 0.521ms period ?
Do you ever change the period, by changing the T3 clock divisor
or the preset value of Timer3, with the set_timer3() function ?
Do you ever disable interrupts (INT_TIMER3 or INT_GLOBAL) ?
Quick solution:
If Timer3 always runs at a 0.521ms period, and if you are currently
manually detecting a failure and then manually resetting the board,
this could be done automatically for you by the Watchdog Timer.
Enable the WDT. Set it for a time-out period longer than 0.521 ms.
The Watchdog Timer on this PIC can be set for any period from 4 ms
to 131 seconds.
The Watchdog method would fix the problem in the field, so that you
could continue to study it in your lab without so much worry.
More possibilities:
The 18F2620 is a very similar PIC with 2x the Flash ROM size. It also
has more than 2x the RAM. The pin labels are identical to the 18F2520.
This PIC would be a good upgrade. If there is a problem in your using
95% of the ROM, perhaps caused by a compiler bug or something, this
would be a good test.
Basically you need several boards in your lab that fail regularly with
your existing code and PIC. Then take some of them and start doing
experiments. Test the watchdog. Test using a different PIC, etc. |
|
|
Franck26
Joined: 29 Dec 2007 Posts: 122 Location: Ireland
|
|
Posted: Fri Jul 16, 2010 12:32 pm |
|
|
Hi,
1. Yes it fail in the lab, same way than on the field.
2. The time before failure is not constant (it would be too good ). The system runs 24/7 and sometimes it fails after several hours, sometimes several weeks and the bug stay until manual reset. So you can imagine how frustrating it is to run a test on this system. The worst is when someone unplug it before the weekend !!!
3. That's a very good point, I haven't tested the bug without RS485 comm. I'll set the test on Monday.
4. I change the timer overflow periode during RS485 comm to leave some dead time between 2 RS485 trams.
The clock divisor is always the same: div=1.
I reload the timer 3 preset value at each IT. I load directly the register, I don't use set_timer3().
I don't have the code in front of me, but from what I remember TMRH is loaded first followed by TMRL which should load the 16 bits at ones.
I can post this part of the code if you think that it can help.
INT_TIMER3 is never disabled. It is enable during init and I don't touch it anymore.
I often disable INT_GLOBAL to protect some variable used both during IT and normal operation. I use the "while loop" that you suggested me to use a while ago to disable and restore GIE. At the time I thought that it was the bug...
Solution:
The watchdog is already used as normal.
Using the watchdog on timer 3 IT means reseting the system when the bug happens. This is not acceptable for this application (camera inspection). This would mean stopping the production. Now when the bug happens the system keep running, the only problem is that there is no communication and if an operator wants to change a setting he has to stop the production and reset.
I'm going to order some 18F2620 and run some test.
Thanks for your help, this is much apreciated !!!
Franck. |
|
|
pad
Joined: 29 Nov 2007 Posts: 15
|
|
Posted: Wed Nov 03, 2010 4:22 am |
|
|
hey Franck26,
Restarting your old topic. Currently I am facing same problem with timer3. It has to update on every 1 ms. But after 1 day, 2 day or 30 days its value changes and starts giving interrupt of 400 to 600 us. My controller is PIC 18F85j90.
Do you find any solution regarding this.
regards
Pad |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19480
|
|
Posted: Wed Nov 03, 2010 10:05 am |
|
|
I'm wondering if an _occasional_ mistiming, could result from the way that the timer3 value is updated.
Franc26, was doing the load himself, now when you load the upper byte, this is effectively 'latched', waiting for the lower byte to be loaded. which then triggers the transfer of both bytes into the register. He talks as if he is loading the registers both in the RS485 code, and in the interrupt. Now, what would happen if he loaded the first byte in the RS485 code, and then an interrupt occurred?. The upper byte is latched, but will then overwritten in the interrupt code when it loads the upper byte. The lower byte is then written, and the interrupt exits, and the code then loads the lower byte waiting from the first transaction. Result 'wrong value'....
However I can only see this giving 'one' wrong time, very rarely.
I would suggest though disabling interrupts round the write in code outside the interrupt.
Best Wishes |
|
|
Douglas Kennedy
Joined: 07 Sep 2003 Posts: 755 Location: Florida
|
|
Posted: Thu Nov 04, 2010 1:44 am |
|
|
I had similar issues with the 2620. It would hang on the timer even with the watchdog. It rarely ran 24 hours and often froze within 5 to 6 hours. The code sequenced a water purification system and the main code was heavily interrupted by all three timers plus rda and tba and button presses.
Trouble shooting this was very unpleasant.
Protecting the 16 bit timer sets and resets from interrupts was the work around. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19480
|
|
Posted: Thu Nov 04, 2010 4:05 am |
|
|
Interesting.
I wonder if there is something 'deeper' in the timer logic when performing 16bit updates, so that an interrupted update, actually causes a problem in the timer circuitry.
I must admit, I'd 'automatically' disable interrupts around anything like this, just to avoid the timing problem if an interrupt does occur, but it does sound as if something worse is actually happening....
Best Wishes |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|