Re: [Simple-evcorr-users] SEC_SHUTDOWN actions do not complete consistently

Risto Vaarandi Wed, 21 Oct 2015 13:21:08 -0700

When you look more closely what killproc() in Centos6 init script actually
does, then you probably spot this:


# TERM first, then KILL if not dead
               kill -TERM $pid >/dev/null 2>&1
               usleep 100000
               if checkpid $pid ; then
                try=0
                while [ $try -lt $delay ] ; do
                    checkpid $pid || break
                    sleep 1
                    let try+=1
                done
                if checkpid $pid ; then
                    kill -KILL $pid >/dev/null 2>&1
                    usleep 100000
                fi

I think it pretty much explains what is happening, and confirms my fears
about TERM followed by KILL. Since these two signals are separated by 3
second interval (see the 'delay' variable in killproc()), sec is killed if
the database saving procedure lasts for more than 3 seconds. So to resolve
this issue, the simplest is to edit
the init script to use 'kill -TERM <sec_pid>' instead of killproc(). If
possible, you could also try to make the database saving procedure more
efficient, since 3 seconds is a lot of time.

kind regards,
risto


2015-10-21 23:00 GMT+03:00 Bond Masuda <[email protected]>:

>
>
> On 10/21/2015 12:54 PM, Risto Vaarandi wrote:
>
> hi Bond,
>
> there is no time limit for the shutdown procedure. In fact, since sec is a
> single-threaded tool, it would be impossible to impose such a timeout. In
> your rule example, the execution of the 'action' field prevents sec from
> doing anything else, and since your 'action' field does not seem to contain
> any actions that would fork background processes, the entire action list is
> executed before sec can continue with other activities.
>
> The only timeout that sec applies is for child processes which are running
> at the moment of termination. The logic works as follows -- firstly, sec
> processes the SEC_SHUTDOWN event (that would also include your rule), then
> the sec process will sleep for 3 seconds, and finally the TERM signal will
> be sent to all child processes and sec will call exit(0). However, since
> the database disconnect is not done in a child process, the 3 second
> timeout has no effect to your rule.
>
> What I am suspecting is one of the following:
> 1) the SEC_SHUTDOWN event does not reach your rule under some
> circumstances (there might be a preceding rule in your rule sequence which
> produces occasional matches),
>
>
> Ok, thank you for the suggestion. I'm going to look into this.
>
> 2) the TERM signal is sent by a script or application which delivers the
> KILL signal to the sec process, once it has discovered after couple of
> seconds that the sec process is still running.
>
> If you want to check scenario 1, you start the action list with 'logonly'
> statement and see if this produces a message about the start of execution.
> Just out of curiosity -- how exactly is the TERM signal delivered to the
> sec process?
>
>
> I'm on centos-6, and i run 'service sec stop', which has this content in
> the init script:
>
> stop() {
>     echo -n $"Stopping $prog: "
>     killproc $prog
>     RETVAL=$?
>     echo
>     [ $RETVAL -eq 0 ] && rm -f $lockfile
>     return $RETVAL
> }
>
> where killproc is defined in /etc/rc.d/init.d/functions:
>
> killproc() {
>     local RC killlevel= base pid pid_file= delay try binary=
>
>     RC=0; delay=3; try=0
>     # Test syntax.
>     if [ "$#" -eq 0 ]; then
>         echo $"Usage: killproc [-p pidfile] [ -d delay] {program}
> [-signal]"
>         return 1
>     fi
>     if [ "$1" = "-p" ]; then
>         pid_file=$2
>         shift 2
>     fi
>     if [ "$1" = "-b" ]; then
>         if [ -z $pid_file ]; then
>             echo $"-b option can be used only with -p"
>             echo $"Usage: killproc -p pidfile -b binary program"
>             return 1
>         fi
>         binary=$2
>         shift 2
>     fi
>     if [ "$1" = "-d" ]; then
>         delay=$(echo $2 | awk -v RS=' ' -v IGNORECASE=1
> '{if($1!~/^[0-9.]+[smhd]
> ?$/) exit
> 1;d=$1~/s$|^[0-9.]*$/?1:$1~/m$/?60:$1~/h$/?60*60:$1~/d$/?24*60*60:-1;if(d==-1)
>  exit 1;delay+=d*$1} END {printf("%d",delay+0.5)}')
>         if [ "$?" -eq 1 ]; then
>             echo $"Usage: killproc [-p pidfile] [ -d delay] {program}
> [-sign
> al]"
>             return 1
>         fi
>         shift 2
>     fi
>
>
>     # check for second arg to be kill level
>     [ -n "${2:-}" ] && killlevel=$2
>
>         # Save basename.
>         base=${1##*/}
>
>         # Find pid.
>     __pids_var_run "$1" "$pid_file" "$binary"
>     RC=$?
>     if [ -z "$pid" ]; then
>         if [ -z "$pid_file" ]; then
>             pid="$(__pids_pidof "$1")"
>         else
>             [ "$RC" = "4" ] && { failure $"$base shutdown" ; return $RC ;}
>         fi
>     fi
>
>         # Kill it.
>         if [ -n "$pid" ] ; then
>                 [ "$BOOTUP" = "verbose" -a -z "${LSB:-}" ] && echo -n
> "$base "
>         if [ -z "$killlevel" ] ; then
>                if checkpid $pid 2>&1; then
>                # TERM first, then KILL if not dead
>                kill -TERM $pid >/dev/null 2>&1
>                usleep 100000
>                if checkpid $pid ; then
>                 try=0
>                 while [ $try -lt $delay ] ; do
>                     checkpid $pid || break
>                     sleep 1
>                     let try+=1
>                 done
>                 if checkpid $pid ; then
>                     kill -KILL $pid >/dev/null 2>&1
>                     usleep 100000
>                 fi
>                fi
>                 fi
>             checkpid $pid
>             RC=$?
>             [ "$RC" -eq 0 ] && failure $"$base shutdown" || success
> $"$base
> shutdown"
>             RC=$((! $RC))
>         # use specified level only
>         else
>                 if checkpid $pid; then
>                         kill $killlevel $pid >/dev/null 2>&1
>                 RC=$?
>                 [ "$RC" -eq 0 ] && success $"$base $killlevel" || failur
> e $"$base $killlevel"
>             elif [ -n "${LSB:-}" ]; then
>                 RC=7 # Program is not running
>             fi
>         fi
>     else
>         if [ -n "${LSB:-}" -a -n "$killlevel" ]; then
>             RC=7 # Program is not running
>         else
>             failure $"$base shutdown"
>             RC=0
>         fi
>     fi
>
>         # Remove pid file if any.
>     if [ -z "$killlevel" ]; then
>             rm -f "${pid_file:-/var/run/$base.pid}"
>     fi
>     return $RC
> }
>
> Thank you Risto.
> Bond
>
>
> kind regards,
> risto
>
>
> 2015-10-21 22:23 GMT+03:00 Bond Masuda <[email protected]>:
>
>> In my SEC rule set, I am using an SQLite in-memory database to cache
>> data. When I shutdown SEC, I save this sqlite database to disk and
>> reload it into memory when SEC starts.
>>
>> I've now observed several times, and it seems to be when the database is
>> large, that the save to disk procedure during SEC_SHUTDOWN doesn't
>> complete. In fact, I try to log messages so I have an idea of success or
>> failure of the $dbh->sqlite_backup_to_file() call; and I sometimes get
>> neither success nor failure log messages; SEC just shuts down. Here is
>> the log when this happens:
>>
>> Wed Oct 21 15:00:50 2015: SIGTERM received: shutting down SEC
>>
>> This is what I expect, and when it works normally:
>>
>> Tue Oct 20 22:28:24 2015: SIGTERM received: shutting down SEC
>> Tue Oct 20 22:28:26 2015: INFO: database saved to disk on attempt 1.
>> Tue Oct 20 22:28:26 2015: INFO: database disconnect successful.
>>
>> This is my rule during SEC_SHUTDOWN:
>>
>> # save database to disk
>> type=Single
>> ptype=SubStr
>> pattern=SEC_SHUTDOWN
>> context=[SEC_INTERNAL_EVENT]
>> continue=TakeNext
>> desc=Save database to disk
>> action= lcall %ret -> ( sub{ \
>>             my $db_backup = '/var/lib/sec/cache.sqlite3'; \
>>             my $tries = 0; \
>>             my $ret; \
>>             my $msg; \
>>             my @return; \
>>             do{ \
>>                 $ret = $dbh->sqlite_backup_to_file($db_backup); \
>>                 $tries++; \
>>             } until ( $ret && ($tries <= 5) ); \
>>             push(@return,$ret); \
>>             if( $ret == 1 ){ \
>>                 $msg = "database saved to disk on attempt $tries."; \
>>             } else { \
>>                 $msg = $DBI::errstr; \
>>             } \
>>             push(@return,$msg); \
>>             return @return; \
>>         } ); \
>>         lcall %is_success %ret -> ( sub{ \
>>             my ($rc, $msg) = split(/\n/,$_[0]); \
>>             return $rc; \
>>         } ); \
>>         lcall %msg %ret -> ( sub{ \
>>             my ($rc, $msg) = split(/\n/,$_[0]); \
>>             return $msg; \
>>         } ); \
>>         if %is_success ( logonly INFO: %msg ) \
>>         else ( logonly CRIT: database failed to save to disk ); \
>>         lcall %ret -> ( sub{ \
>>             my $ret = $dbh->disconnect(); \
>>             return $ret; \
>>         } ); \
>>         if %ret ( logonly INFO: database disconnect successful. ) \
>>         else ( logonly CRIT: database disconnect failed. )
>>
>> As you can see above, either success or failure should log a message,
>> but when this problem occurs, I get nothing. So, I'm wondering if during
>> SEC shutdown, is there a time limit on how long the shutdown procedure
>> has before it just exits completely? I wonder if when the database is
>> large, that the save to disk procedure takes too long and SEC just
>> exists without allowing it to complete? Is this possible?
>>
>> Thanks
>> Bond
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Simple-evcorr-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>>
>
>
>

------------------------------------------------------------------------------

_______________________________________________
Simple-evcorr-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] SEC_SHUTDOWN actions do not complete consistently

Reply via email to