Direct stack manipulation?

guy · Joined: 21 Oct 2005 Posts: 297

MCU: PIC24FJ64GA308
I have code that works with a cellular modem and uploads a file in FTP. If for some reason (and there could be a dozen of those) I am stuck waiting for a response string, a watchdog timer will reset the chip. It works but it's not cool.
The code is divided into several functions, and error handling is hard and time consuming.
Is there a way to create a mechanism similar to an operating system, in which if a process hangs (does not restart a watchdog timer) the code returns to a point in the Main routine REGARDLESS of the stack? of course the stack should be sorted out or cleaned somehow so I can later go into Retry.

*Please don't suggest old-school structural programming techniques of error handling. I am interested in learning something new...

temtronic · Posted: Sun Aug 20, 2017 7:07 am

re: ...
If for some reason (and there could be a dozen of those) I am stuck waiting for a response string,

If this is a 'serial' response, CCS does show a 'timed receive' function, I think in the FAQ section of the manual. I've used a version of it to allow the PIC to know when the host PC 'dies'. It's also applicable to say GSM modems or RFID modules where 'something' should besent from them but alas it takes too long. You should set the 'timeout' time to say 2X or 3X the known response time.

Jay

Ttelmah · Joined: 11 Mar 2010 Posts: 19506

Realistically the only place you can go back to with the stack sorted out, is a restart.

This is where 'restart_cause' comes in.

Design your code so the main variables are static, and not initialised by the compiler (so no 'initialisation' values), and then test restart_cause. If it is the normal power on reset, initialise the variables yourself. If not, you are back into the 'main' without the variables being changed. This is how you can program for a watchdog, but equally you can test for a software reset, and use the reset_cpu instruction.
Similarly, you can also not initialise peripherals if required (use NO_INIT in the #use declarations), and only physically initialise these when you require. So you could (for instance) have two tests on restart cause, and if it is a reset_cpu, initialise nothing, but if it is a watchdog, initialise the hardware.

However it does come down to why your code itself does not exit tidily?. For instance, if you are calling things that read (like serial input), these can either be tested before reading, or have a timeout (as Temtronic says).

newguy · Joined: 24 Jun 2004 Posts: 1907

I also must put forth a timer based "graceful exit". Very general code flow would be:
- I'm expecting a response within x seconds
- either start a new timer or add a "looking_for_response" flag to an existing timer's routine that will throw another flag "comms_timed_out". If the link times out, do whatever you need to do to gracefully exit/abandon looking for a response.
- in your serial comm routine, if you get "x" response (the one you're looking for), then set "looking_for_response" flag to FALSE. If you started a new timer, stop it and disable its interrupt.

guy · Joined: 21 Oct 2005 Posts: 297

Thank you guys.
Ttelmah, your idea to restart is very creative. It's like making the whole main() into the main loop and at the beginning check restart_cause to initialize registers & peripherals. Nice!

I am not talking about timeouts in comm. Imagine you are waiting for an OK or ERROR string from the modem and with a timeout. No problem so far. But if you are waiting for a dozen of those in different parts of the code since the code includes several different commands. Each time the command & parsing is different, each test can lead to an error, and structural programming is not really built for that. In C# there is Try & Catch for that.
For PICs, goto is one option, but it only works inside the function (in other words, when the call stack is not involved).

Ttelmah · Joined: 11 Mar 2010 Posts: 19506

I'd probably suggest it is tidier to make the main a 'wrapper'.

So create your own 'main_code' routine, and 'software_init'/'hardware_init' routines, and then just have the normal 'main' decide what to call.

It's important to understand 'why' this does not exist. It is fundamentally not part of C. C inherently does not have an ability to do this. You can _inside a routine_, generate a try/catch type mechanism (using setjmp and longjmp), but to make this come out multiple layers, requires you to completely control the stack. This will get very complex (you could generate a stack buffer table, and save W15 for particular depths of jump/restore), but the odds of getting it to work reliably are low....

The alternative, is to work the other way.

The hardware 'reset_cpu' instruction, explicitly resets the stack to the boot state. So treat this as the external 'master' call, which always takes you to the 'wrapper' function. Then split this function up:

guy · Joined: 21 Oct 2005 Posts: 297

Excellent example for future generations to come! Cool

In my case it is a gateway that most of the time waits for wireless packets and once every 24h uploads the data to a server. Since the main loop is very simple, a reset will not be hard to handle. I will just make sure not to lose the data and time after a software/WDT reset.
Thanks!

RF_Developer · Joined: 07 Feb 2011 Posts: 839

I think an important takeaway here is that when you are contemplating something like stack manipulation then somethings gone way too far, and there has to be another way.

My personal approach would be like Newguy's: use timers or a clock tick to implement timeouts, separating the sending of messages from dealing with responses, timeouts being handled in mainline code. I'm not so keen on the watchdog/restart approach. Either way, what you don't want to be doing is waiting, e.g. with delay_ms() inside routines.

I have used a setjump based approach for try-catch type code with some success but that was with a one-shot main where each run was independent. It was for a battery-powered Go/No Go test box where it ran a series of short tests when it was powered up by a push-button, displayed the results on LEDs and then switched itself off. The try-catch was entirely in main() and there was no loop. Worked great, and the boxes are still on the first set of batteries (4 x AA) after a couple of years but it wasn't implementing timeouts and there wasn't any attempt at multithreading, i.e. a timeout runs in another context, such as an interrupt.

guy · Joined: 21 Oct 2005 Posts: 297

Ttelmah · Joined: 11 Mar 2010 Posts: 19506

Yes. It is an important distinction.

Internally every routine and data layout should if possible follow structured procedures, but the presence of external 'trap' capabilities, can in some cases be much cleaner. Interrupt programming in particular is better done without getting hooked on structured programming.

This is of course why languages like C# have the try/catch abilities, and in a very real sense we already have a master trap ability in the watchdog. The 'restart from go' ability is inherent in the PIC instruction set, and using this carefully can in some cases save a lot of complexity. But, keyword, 'carefully'...

guy · Joined: 21 Oct 2005 Posts: 297

I just found out that one of the newer PICs, PIC16F18855 (and others I suppose) have a special bit to indicate a reset caused by a Reset instruction.
PCON0 register,

temtronic · Posted: Fri Sep 08, 2017 8:08 am

Something not mentioned here is the probability that variables in RAM may also be corrupted when the PIC visits 'Lala' land.
It's quite possible that stuff other than the stack will be 'modified',perhaps even pin directions, so doing a partial 'warm boot' so to speak, isn't maybe a good idea, rather a full 'reset' or 'hard' reboot to ensure variables get set to KNOWN values.

just something to ponder...

Jay

guy · Joined: 21 Oct 2005 Posts: 297

Are you basing this on experience? IMHO if there is a special command for Reset and defined VDD for RAM retention etc. The whole scenario should be stable regarding RAM and SFRs. This is all documented.

temtronic · Posted: Fri Sep 08, 2017 9:07 am

yes, Real World isn't always nice....had some bad crosstalk/EMI on an early project( 20 year ago) and PIC went to 'LALA' land, got 'hung up', and several variables in RAM were corrupted so I am leary about 'soft reboots' where not all SFRs, RAM, etc. get reinitialized to known values.

Maybe the new PICs are better but 'once bitten, twice shy'.

Jay

Ttelmah · Joined: 11 Mar 2010 Posts: 19506

The restart_cause function already tests that bit and will tell you that the system has been software restarted.
As others have said the caveat is you have to ensure that the startup sets up what needs to be setup.

I have a system that can have it's configuration changed from a file loaded from a server, triggered by a text message. Once this is received, it does reset_cpu. the code resets all the things like counters etc., to their 'boot' values, but does not re-initialise the other peripherals. However a reset caused by an error, very carefully does reset the peripherals in case one of these is what is causing this.