Thursday, January 05, 2006

The ksh script segmentation fault

This is what got me started into writing this blog... The segmentation fault you never want to see. My sympathies if you have made it here searching for information on how to deal with seg faults.

The script was not mine, but the author is better than me at shell scripting (I prefer perl) so I was more of a victim trying to help out.

The idea of the script was to search recursevily through a list of directories looking for certain files and concatenating strings from those files together. It all worked well until code was added to speed things up by ignoring areas that had already been searched.

The odd thing was that some people had segmenetation faults, others not. And particular ways of running the script would fail and others not. Finally we even noted that running the script from a particular location would effect the results also.

Through the use of echo we managed to figure out that the variable containing a big long string was getting screwed up... but it was fine the last time we touched it. At some point in time it just got corrupted. Well I guess thats part of the deal with segmenetation faults, memory allocation problems will do things that logically (or at least within the realm of the script) should not happen.

The segmentation produced a core dump, but gdb (which I am far from being proficient in) couldn't seem to do anything at all.

Finally we found the problem, but why it mostly worked but on the odd occasion (but repeatable occasion) wouldn't is still beyond me.

The problem (we think - it went away anyway) was the clearing of a variable while in an until loop using that variable. However the variable was being set again before the loop ended.

Say for example:
until [ -z "${VAR} ] ; do
# some code
VAR=
# some more code
VAR=${SOME_OTHER_VAR}
done

It would seem to be one of two things.
1. The until loop uses a pointer to the label VAR. Which gets changed when it is set to equal null (VAR= ). Memory problems occur and everything goes strange.... But im not sure why they would do that.

or

2. I came across lots of references to segmentation faults when using unset, maybe setting to null uses the same code as unset? But I doubt it.

Anyway, it seems to work now.

0 Comments:

Post a Comment

<< Home