[testing-discuss] abnormal termination of a test case in CTI-TET

Vladimir Kotal Mon, 02 Mar 2009 21:58:20 +0100

Hi all,

I seem to have encountered another unpleasant behavior of CTI-TET: when 
working on better handling of potential mktemp(1) failures, I also 
wanted to remove the already created temporary files upon abnormal test 
case/purpose termination so they are not left laying around, e.g. when 
the user sends SIGTERM to the run_test script.


Browsing through the code of STF based test suites, there are often the 
following lines of code (ksh):

      # Make sure we cleanup
      trap cleanup 1 2 15


The function cleanup() will then do whatever cleanup it can do upon 
receiving one of the specified signals.

I did a little experiment with a test case which comprises of multiple 
test purposes and hit Ctrl-C in the middle of the test run. To my 
surprise there were no leftover temporary files. So I modified one of 
the test purposes to create temporary file, sleep for long period of 
time and then remove the temporary file and exit. This is how it went:

1. we are happily running:

18269 -sh
   18273 bash
     24075 /bin/bash
       1071  /usr/bin/ksh 
/opt/SUNWstc-tetlite/contrib/ctitools/bin/run_test nc
         1168  tcc -p -e -a /var/tmp/testzone_1071/CONFIG -j 
/var/tmp/results.10
           1169  /usr/bin/ksh -p tc_hflag all
             1193  /usr/bin/ksh -p tc_hflag all
               1196  /usr/bin/ksh93 /bin/sleep 30

2. here I hit Ctrl-C:

18269 -sh
   18273 bash
     24075 /bin/bash
       1071  /usr/bin/ksh 
/opt/SUNWstc-tetlite/contrib/ctitools/bin/run_test nc
         1168  tcc -p -e -a /var/tmp/testzone_1071/CONFIG -j 
/var/tmp/results.10
           1169  <defunct>
1193  /usr/bin/ksh -p tc_hflag all
   1196  /usr/bin/ksh93 /bin/sleep 30

3. run_test has exited but the test purpose is still running:

1193  /usr/bin/ksh -p tc_hflag all
   1196  /usr/bin/ksh93 /bin/sleep 30


The test purpose happily continued away and at the end it cleaned the 
temporary file. This means that one cannot be sure all test case 
processing is really done after run_test is finished which is a bit 
shocking to me. Running run_test again while a test purpose is still 
running could easily produce unpredictable behavior and confusion with 
more complicated test suites.

Looking into run_test.ksh, it is handling SIGTERM via 
stcnv-clone/usr/src/tools/tet/contrib/ctitools/src/utils/run_test.ksh:cleanup() 
function which sends SIGTERM to the tcc program:

    1242        if [[ ! -z $TCC_PID ]]
    1243        then
    1244                echo kill -TERM $TCC_PID
    1245                kill -TERM $TCC_PID
    1246                wait $TCC_PID
    1247                TCC_PID=
    1248        fi

However, tcc does not seem to do anything after receiving the signal in 
stcnv-clone/usr/src/tools/tet/src/tet3/tcc/sigtrap.c:initial_sigtrap() 
SIGTERM handler:

     129 static void initial_sigtrap(sig)
     130 int sig;
     131 {
     132        static char text[] = "TCC shutdown on signal";
     133
     134
     135        if (jnl_usable())
     136                (void) fprintf(stderr, "%s %d\n", text, sig);
     137
     138        fatal(0, text, tet_i2a(sig));
     139 }

This means that my effort to make the test suite more robust against 
abnormal termination is probably futile since the fatal() seems to do 
only slightly better version of exit(). Anyone got different experience ?


v.

[testing-discuss] abnormal termination of a test case in CTI-TET

Reply via email to