Stack depth is unexpectedly high

RossJ · Joined: 25 Aug 2004 Posts: 66

Hello,

I am looking into why my program compiles very differently for debug or release.

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

The difference between Debug and Release version is because in the Release version the compiler applies more aggressive optimization for both speed and size. See the compiler settings. Higher level means more optimization. Too bad CCS doesn't explain the differences between optimization codes. Often level 9 is chosen.

Aren't you looking into a problem that isn't a problem at all?

Stack depth isn't in the critical zone yet. Your debug version is using 24 of the 31 available levels. Considering that you are already using 81% of the available ROM space it is not to be expected that this will grow much higher.

Ttelmah · Joined: 11 Mar 2010 Posts: 19513

The latest compilers primarily optimise for speed on the release code, which is why V5 compilers often give larger ROM sizes than the older compilers.
There is a new optimisation command #OPT COMPRESS, which instead makes it try to optimise for minimum size (very aggressive...).
I'd worry about the stack if you were within a couple of levels, not when it is half empty. As Ckielstra says, worry more about the ROM size. Look for any 'low priority' functions (things where you don't care about speed), which are being inlined, and then declare these as #separate. Will increase your stack usage, but decrease the ROM.
Remember 'simple' things like arithmetic, will often involve several call levels. I don't think the compiler shows these in the .tre file. That may change if you remove the #nolist option?.

As a comment, I think you may be getting misled about what you are seeing. The 'main' figure given by the compiler, _includes_ the potential interrupt calls inside the main.
So the 19 levels shown is only half way up the stack.
Look at the figure at the top of the lst file, which is the most informative one.

Best Wishes

RossJ · Joined: 25 Aug 2004 Posts: 66

Hi guys,

Thanks for your comments and my apologies for taking so long to respond. I have since done further investigation on this and believe I have the issue sorted. I have also realised that I omitted a couple of details from my original post which may have influenced your responses. So below are some additional points of relevance and some of my conclusions.

1. I am not employing any CCS feature when building explicitly for debug or release, except for using #fuses DEBUG which doesn't impact the build other than setting the appropriate fuse bit. There are a couple of small code differences controlled through pre-processor conditions. This seems to be what's triggering the differences in compiler behavior as the release code introduces a function which uses several additional stack levels.

2. The afore mentioned 'release only' function is actually part of an error handler which always leads to a restart. I had attempted to remove this code from the call tree by locating it at a fixed address, resetting the STKPTR on entry and using goto_address() to reach it. The problem with this is that the compiler treats the isolated function in the same way as it does interrupt routines. Thus all stack levels become an overhead across the entire program, and not just from the point at which they are needed. This is why 'Stack used (ints)' is so high (8).

3. The same optimisation level is used for debug and release in the table quoted earlier (#opt 8). I normally use 9, but had changed to 8 due to a bug in 5.013. I'm now back to using 9.

4. The debug build quoted above uses 24 + 6 = 30 of 31 stack levels. The release build added another 5 or so levels which forced the compiler to inline many functions which lead to the increased ROM size (as expected) and a massive increase in compile time (from 30 sec to 3 min).

5. I have now refactored the error handling code so that all work is done following the restart (instead of prior to it). My program now compiles with 68% ROM and 24+3 stack levels. I also tried the #opt compress option mentioned and that reduces ROM usage to 63% but stack rises to 26+3. The compiler is not performing inlining to free up stack in either case. The code is 'reasonably' mature so I am not too concerned about this ROM usage. The next PIC up is a PIC24.

FINALLY SOME OBSERVATIONS ABOUT THE COMPILER...

1. The .tre file is actually compressed. Any function which is called multiple times is only included once with subsequent occurrences being replaced with a *. This mislead me when I was trying to determine what parts of my code were at peak stack depth. I wrote a small Java utility to expand the .tre file and that produced a tree consistent with the summary at the top of the .lst file. I don't think the compiler always did this. Maybe it should be optional...

2. Optimisation doesn't appear to affect the .tre file. Compiler generated functions (arithmetic, delay, sprintf etc.) are shown in the .tre file.

3. When the compiler inlines a function, it is still shown in the .tre file but marked as such. Inlined functions are not counted toward the stack usage.

4. When the compiler replaces CALL with GOTO/BRA to enter a function, as it does when a function is only called once, this still counts as a stack level!!! The compiler counts it towards the summary at the top of the .lst file and may begin inlining functions early due to a perceived shortage of stack. In fact I was able to create a test program which the compiler inlined many functions because it 'ran out' of stack, and yet there were no CALLs at all in the compiled code. Given that the main purpose for replacing CALL with GOTO/BRA has to be to save on stack, this must be a bug?

5. When a program is using too many stack levels (main + ints > available), the compiler automatically inlines functions to reduce stack usage to within hardware constraints. This step occurs after the individual files and main are compiled, and can take considerable time. The PCH GUI window is also non-responsive during this activity.

Cheers, Ross.