|
|
View previous topic :: View next topic |
Author |
Message |
guy
Joined: 21 Oct 2005 Posts: 297
|
INT_RDA4 stops working after a day or two |
Posted: Wed Apr 22, 2015 8:48 am |
|
|
PIC24FJ64GA308 , CCS PCD C Compiler, Version 5.040
The code works well but after a day or two the interrupt routine stops getting called although characters are still flowing in. A software reset fixes the problem.
The most obvious is to add ERRORS to the declaration but it's there. I don't see that I disable interrupts without enabling them back or anything like that. The code to clear the RXIF is a bug fix to an old compiler bug (not sure if it's still required but shouldn't cause the problem).
Any other ideas why the interrupt will not get called again?
Any 'elegant patch' ?
Thanks!
Code: | #use RS232(STREAM=R232, BAUD=9600, UART4, ERRORS)
byte tmp;
...
#INT_RDA4
void rs232isr() {
#BIT u4rxif = 0x8E.8
tmp=fgetc(R232);
...
u4rxif=0;
}
|
The call to fgetc() is at 13AC and the command itself is at 138E
Line Address Opcode Disassembly
2504 138E AE02B2 btss.b 0x02b2,#0
2505 1390 37FFFE bra 0x00138e
2506 1392 F802B2 push.w 0x02b2
2507 1394 F90800 pop.w 0x0800
2508 1396 8015B0 mov.w 0x02b6,0x0000
2509 1398 A922B2 bclr.b 0x02b2,#1
2510 139A 060000 return
2511 139C F80042 push.w 0x0042
2512 139E F80036 push.w 0x0036
2513 13A0 F80054 push.w 0x0054
2514 13A2 781F80 mov.w 0x0000,[0x001e++]
2515 13A4 200020 mov.w #0x2,0x0000
2516 13A6 09000C repeat #12
2517 13A8 781FB0 mov.w [0x0000++],[0x001e++]
2518 13AA EC31BA inc.w 0x11ba
2519 13AC 02138E call 0x00138e
2520 13AE 000000 nop
2521 13B0 B7E902 mov.b 0x0000,0x0902
2522 13B2 808DC4 mov.w 0x11b8,0x0008
2523 13B4 2012C3 mov.w #0x12c,0x0006
2524 13B6 E11804 cp.w 0x0006,0x0008
2525 13B8 340007 bra les, 0x0013c8
2526 13BA BF91B8 mov.w 0x11b8,0x0000
2527 13BC EC31B8 inc.w 0x11b8
2528 13BE 780280 mov.w 0x0000,0x000a
2529 13C0 2108C4 mov.w #0x108c,0x0008
2530 13C2 428304 add.w 0x000a,0x0008,0x000c
2531 13C4 BF8902 mov.w 0x0902,0x0000
2532 13C6 984300 mov.b 0x0000,[0x000c+0]
2533 13C8 A9008F bclr.b 0x008f,#0
2534 13CA A9008F bclr.b 0x008f,#0
2535 13CC 2001A0 mov.w #0x1a,0x0000
2536 13CE 09000C repeat #12
2537 13D0 78104F mov.w [--0x001e],[0x0000--]
2538 13D2 78004F mov.w [--0x001e],0x0000
2539 13D4 F90054 pop.w 0x0054
2540 13D6 F90036 pop.w 0x0036
2541 13D8 F90042 pop.w 0x0042
2542 13DA 064000 retfie |
|
|
Mike Walne
Joined: 19 Feb 2004 Posts: 1785 Location: Boston Spa UK
|
|
Posted: Wed Apr 22, 2015 10:28 am |
|
|
Is main() still running and at the correct speed?
Mike |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Wed Apr 22, 2015 11:41 am |
|
|
Quote: | Is main() still running and at the correct speed? |
Yes, the whole program is running well, just no interrupts. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Wed Apr 22, 2015 11:49 am |
|
|
No interrupts of any kind ?
Can you add code to debug this ? Perhaps poll an i/o pin in a loop in
main(). Put a pullup and a switch on the pin. When the switch is pressed,
use printf to display the contents of all registers associated with interrupts.
Specifically, the Global interrupt flag and interrupt enables for peripherals.
This is a test that could be done in Release mode. |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Wed Apr 22, 2015 11:32 pm |
|
|
Other interrupts are ok. Only INT_RDA4 is not called.
Debugging is tough since this is happening in a remote location and only after a day or two (sometimes only after a week).
I was hoping for someone to share an 'Oh, THAT!' insight or to find something in the disassembled code.
If not I would add a patch to reconfigure the UART and re-enable the interrupt if no data is coming in. Not very glorious and it leaves the HW/compiler bug hidden away...
If someone else has ideas please let me know. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Thu Apr 23, 2015 12:51 am |
|
|
Get rid of this:
u4rxif=0;
The compiler does this automatically, unless you specify the interrupt with 'NOCLEAR'.
So the flag is being cleared twice. Wonder what happens if it interrupts _between_ the two clears?. With one, the interrupt won't be cleared, so the routine will be called again. With two?..... |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Thu Apr 23, 2015 2:29 am |
|
|
Thanks Ttelmah. As I said this was a fix to an old compiler bug. Since I'm out of better ideas I will remove it and see what happens.
Q: Does anyone have experience in INVOKING an interrupt call by setting the xxIF bit manually? Would it work? (assuming all other terms are met) |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Thu Apr 23, 2015 4:50 am |
|
|
Yes. I've done this.
I had a particular interface chip that during setup, could end up wanting service, but without the interrupt set. So after the setup I polled it, and if handling was required, set the interrupt.
Question on your original problem, was did the code have the same hang without the fix?. Presumably you had an old version where it didn't automatically clear this interrupt?. If it still gave the lockup then, then this isn't the problem, but I just wondered. I remember on one PIC finding that if you disabled interrupts on two successive instructions it gave problems.
I'd probably leave your clear in place, and try with the 'NOCLEAR' directive, which would ensure the compiler doesn't do it as well. That way it should work with both old and new compilers. |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Thu Apr 23, 2015 7:49 am |
|
|
Quote: | did the code have the same hang without the fix? |
in the old compiler version it didn't clear the interrupt flag which caused only a single character to be received. So it was a different behavior. Gladly I don't need to support older compilers so I'll just comment out the command and see if it solves anything.
My theory was that during the interrupt routine the flag could be cleared at any point (and possibly multiple times, for whatever reason) and only after the RETFIE additional interrupt requests would come in.
Let's see what happens. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Thu Apr 23, 2015 3:00 pm |
|
|
Quote: |
Debugging is tough since this is happening in a remote location and only
after a day or two (sometimes only after a week).
|
I'm wondering about this. Can't you re-create the remote environment
conditions on your lab bench ? If so, does the problem only occur in the
remote environment ? Remote environment implies, possibly, lightning
strikes, extreme humidity changes, air pressure, temperature swings,
cosmic rays, power supply issues, human manipulation, poor connections
to the board, poor quality signal inputs, etc.
If the board works on your lab bench without all those variables, then I
would try to duplicate them until I got a failure. If I could not get a
failure in the lab, then I would add some facilities to record data about
the failure in the field. |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Thu Apr 23, 2015 11:06 pm |
|
|
PCM, in this case it's easier to follow Ttelmah's advice and give it a try than to recreate the whole setup. The specific problem (one specific interrupts stops working but returns to normal operation after a software reset) implies that it's probably not a hardware issue.
Cosmic rays??? Sounds like there's a good story there |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Fri Apr 24, 2015 12:46 am |
|
|
Cosmic ray effects on electronics, are one of the few 'real' things that can genuinely require a watchdog to fix. Very rare, but spend some time and you will get the odd memory cell flipping state without any normal influence. Normally so rare that these are out in the 'noise' caused by other effects, but they do happen....
Now on your problem. I only raised the double clear, because I had a problem a while ago, on another PIC, where it got into an odd state, if you did two clears on successive operations, with characters received in the instant between the two clears, being effectively lost. This is now an erratum for the chip concerned.
A 'non bench reproducible' fault, is always the worst one to track down. However I have to wonder if you could add some diagnostics to actually 'know' when the fault has occurred?. Does data come in a reasonably regular intervals on UART4?. If so, have you got a system clock?. If the answer to both of these is 'yes', then I'd be tempted to add a diagnostic routine, that in the event data is not seen in the interval, 'snapshots' everything associated with this UART. The actual state of the input pin. All the UART registers etc.. If you haven't got ROM available to store the data, reserve a page of the program memory, and do a single write to this. Then have the chip stop, and indicate that the event has happened, and you can read the data, and have a hope of finding out what is happening.
Is this happening on multiple chips?. If it is only one, then consider that it may simply be a faulty chip.
Beyond this, one (nasty) thought, is that something is actually changing the PPS register, so the pin is no longer deemed to be connected to the UART. This then stops receiving, and (of course) a reset re-programs the PPS data. This would fit well with the perceived behaviour. This is one of the reasons why the chip has locking to stop the bits being changed after boot (IOL1WAY). If you are not changing the PPS data after the program starts, then if you have NOIOL1WAY selected, try changing this.
When you say a 'RESET', do you mean just a reset (with MCLR), or a power-on-reset?. This makes a big difference. The latter, will also clear things like CMOS inputs that have become latched by a voltage spike, while the former will not. This affects where the problem may actually be.
You could also read the RS232_ERRORS byte, after fetching the character in the routine. If error bits have become set in this, then again it could help diagnosing what is going on. |
|
|
guy
Joined: 21 Oct 2005 Posts: 297
|
|
Posted: Fri Apr 24, 2015 8:24 am |
|
|
Thanks for the huge support everyone. I will go with my hunches and let you know if I find something substantial. |
|
|
gpsmikey
Joined: 16 Nov 2010 Posts: 588 Location: Kirkland, WA
|
|
Posted: Fri Apr 24, 2015 10:25 am |
|
|
Sounds like some sort of collision or state race - one thing that comes to mind is the possibility that while you are reading and clearing the interrupt for the incoming character, another one arrives. You now have one waiting without the interrupt being set since it was just cleared. Might be worth it to check for another character before exiting the ISR. Probably not likely at low baud rates, but at higher baud rates and ISR latency depending on what else is going on, it is possible.
mikey _________________ mikey
-- you can't have too many gadgets or too much disk space !
old engineering saying: 1+1 = 3 for sufficiently large values of 1 or small values of 3 |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19504
|
|
Posted: Fri Apr 24, 2015 10:40 am |
|
|
Yes. Given the four character buffer:
Code: |
#use RS232(STREAM=R232, BAUD=9600, UART4, ERRORS)
byte tmp;
...
#INT_RDA4 NOCLEAR
void rs232isr() {
#BIT u4rxif = getenv("BIT:U4RXIF")
do
{
tmp=fgetc(R232); //really need a circular buffer.....
} while (kbhit(R232));
u4rxif=0;
}
|
As a further comment, given the lack of a proper buffer, what happens if a second character is received before the one has been read from tmp?. Could this cause a code problem?. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|