Hi Emilio,

thanks for having a look at this.

I did some more tests and research in the meantime.

I was able to reproduce using a simple shell script as the check
command, which does not do anything but sleep a period of time higher
than the timeout.

I then tried to understand how farmguardian works. It is using
$SIG{ALRM} for timeout handling when executing the check command.
http://perldoc.perl.org/perlipc.html#Signals says "If the operation
being timed out is system() or qx(), this technique is liable to
generate zombies". I also read many postings in various forums from
persons having problems with zombies using perl. I am not an expert
but as I would understand it now, if the child process (check command)
times out and the parent (farmguardian) continues, the child becomes a
zombie if the parent does not read its status. So as a quick fix I
modified the farmguardian script to use a non-blocking waitpid when
the check command times out to read the child status. For whatever
reason this leaves me with still one zombie per farm that has failing
farmguardian checks, but more important they do not keep adding up
anymore.

This is how I modified farmguardian:

use Proc::Daemon;

changed to

use Proc::Daemon;
use POSIX(WNOHANG);

and

      if ($@){
           warn "$command timed out.\n";
           $timedout = 1;
      }

changed to

      if ($@){
           warn "$command timed out.\n";
           $timedout = 1;
           do {
             my $kid = waitpid(-1, WNOHANG);
             #print "pid $kid exited\n";
           }
           while $kid > 0;
      }


I don't know if this would work for others but it seems to work for me
right now and I hope it helps your lab study. I will post an update if
anything worth mentioning changes.

Kind Regards,
Stefan


On Thu, Feb 13, 2014 at 5:07 PM, Emilio Campos
<[email protected]> wrote:
> Hi Stefan, we have to configure a lab to study this behaviour, your reported
> information will help us.
>
> Thanks.
>
>
> 2014-02-13 15:37 GMT+01:00 Stefan <[email protected]>:
>>
>> Please find a farmguardian log excerpt and additional info below.
>>
>> I have noticed that stopping the target service (e.g. Apache) does not
>> seem to result in defunct processes. But they certainly appear when I
>> disconnect the real server's virtual NIC for example.
>>
>> Kind Regards,
>> Stefan
>>
>> Farmguardian Config:
>> t-c-443: check_tcp -H HOST -p PORT -S -w 5 -c 5 -t 10
>> t-n-80:  check_http -I HOST -p PORT -H 't-n.somedomain' -u '/abc/app'
>> -w 5 -c 5 -t 10 -e '302 Found'
>> t-n-443: check_http -I HOST -p PORT -S -H 't-n.somedomain' -u
>> '/abc/app' -w 5 -c 5 -t 10 -e '200 OK'
>> t-o-80:  check_http -I HOST -p PORT -H 't-o.somedomain' -w 5 -c 5 -t
>> 10 -e '302 Found'
>> t-o-443: check_http -I HOST -p PORT -S -H 't-o.somedomain' -w 5 -c 5
>> -t 10 -e '302 Found'
>>
>> Process list:
>> ...
>>     1  1342  1342  1342 ?           -1 Ss       0   0:00
>> /usr/local/zenloadbalancer/app/mini_httpd/mini_httpd -C
>> /usr/local/zenloadbalancer/app/mini_httpd/mini_httpd.conf
>>     1 27924 27924 27924 tty1      7057 Ss       0   0:00 /bin/login --
>> 27924 21781 21781 27924 tty1      7057 S        0   0:00  \_ -bash
>> 21781  7057  7057 27924 tty1      7057 R+       0   0:00      \_ ps axjf
>>     1 28308 28085 28085 ?           -1 S        0   0:05 /usr/bin/perl
>> /usr/local/zenloadbalancer/app/farmguardian/bin/farmguardian t-c-443
>> -l
>>     1 28342 28085 28085 ?           -1 S        0   0:04 /usr/bin/perl
>> /usr/local/zenloadbalancer/app/farmguardian/bin/farmguardian t-o-80 -l
>> 28342  6507 28085 28085 ?           -1 Z        0   0:00  \_ [sh]
>> <defunct>
>>     1 28377 28085 28085 ?           -1 S        0   0:04 /usr/bin/perl
>> /usr/local/zenloadbalancer/app/farmguardian/bin/farmguardian t-n-443
>> -l
>> 28377  6512 28085 28085 ?           -1 Z        0   0:00  \_ [sh]
>> <defunct>
>>     1 28412 28085 28085 ?           -1 S        0   0:04 /usr/bin/perl
>> /usr/local/zenloadbalancer/app/farmguardian/bin/farmguardian t-n-80 -l
>> 28412  6506 28085 28085 ?           -1 Z        0   0:00  \_ [sh]
>> <defunct>
>>     1 28450 28085 28085 ?           -1 S        0   0:04 /usr/bin/perl
>> /usr/local/zenloadbalancer/app/farmguardian/bin/farmguardian t-o-443
>> -l
>> 28450  6500 28085 28085 ?           -1 Z        0   0:00  \_ [sh]
>> <defunct>
>>
>> Farmguardian log excerpt t-o-80:
>> The servers timeout is: 10
>>     checking:
>>         farmname: t-o-80
>>         timeout: 10
>>         blacklist:
>>         timetocheck: 10
>>         portadmin: -1
>>         server[0]: a.b.234.20:80
>>         server[1]: a.b.235.20:80
>>         check: check_http -I HOST -p PORT -H 't-o.somedomain' -w 5 -c
>> 5 -t 10 -e '302 Found'
>>
>> execution in Tue Feb  4 10:21:34 2014 ::
>>         server[0]: a.b.234.20:80
>> Backend status 0: up
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.234.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> timedout: 0
>> errorcode: 0
>> No state changed for the backend.
>>         server[1]: a.b.235.20:80
>> Backend status 1: up
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.235.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> timedout: 0
>> errorcode: 0
>> No state changed for the backend.
>> The servers timeout is: 10
>>     checking:
>>         farmname: t-o-80
>>         timeout: 10
>>         blacklist:
>>         timetocheck: 10
>>         portadmin: -1
>>         server[0]: a.b.234.20:80
>>         server[1]: a.b.235.20:80
>>         check: check_http -I HOST -p PORT -H 't-o.somedomain' -w 5 -c
>> 5 -t 10 -e '302 Found'
>>
>> execution in Tue Feb  4 10:21:44 2014 ::
>>         server[0]: a.b.234.20:80
>> Backend status 0: up
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.234.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> timedout: 0
>> errorcode: 0
>> No state changed for the backend.
>>         server[1]: a.b.235.20:80
>> Backend status 1: up
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.235.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> /usr/local/zenloadbalancer/app/libexec/check_http -I a.b.235.20 -p 80
>> -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found' timed out.
>> timedout: 1
>> errorcode: 0
>> **execution error in '
>> /usr/local/zenloadbalancer/app/libexec/check_http -I a.b.235.20 -p 80
>> -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found' ', output::**
>> The servers timeout is: 10
>>     checking:
>>         farmname: t-o-80
>>         timeout: 10
>>         blacklist:
>>         timetocheck: 10
>>         portadmin: -1
>>         server[0]: a.b.234.20:80
>>         server[1]: a.b.235.20:80
>>         check: check_http -I HOST -p PORT -H 't-o.somedomain' -w 5 -c
>> 5 -t 10 -e '302 Found'
>>
>> execution in Tue Feb  4 10:22:04 2014 ::
>>         server[0]: a.b.234.20:80
>> Backend status 0: up
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.234.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> timedout: 0
>> errorcode: 0
>> No state changed for the backend.
>>         server[1]: a.b.235.20:80
>> Backend status 1: fgDOWN
>> command: /usr/local/zenloadbalancer/app/libexec/check_http -I
>> a.b.235.20 -p 80 -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found'
>> /usr/local/zenloadbalancer/app/libexec/check_http -I a.b.235.20 -p 80
>> -H 't-o.somedomain' -w 5 -c 5 -t 10 -e '302 Found' timed out.
>> timedout: 1
>> errorcode: 0
>> No state changed for the backend.
>>
>>
>> ------------------------------------------------------------------------------
>> Android apps run on BlackBerry 10
>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>> Get your Android app in front of a whole new audience.  Start now.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Zenloadbalancer-support mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/zenloadbalancer-support
>
>
>
>
> --
> Load balancer distribution - Open Source Project
> http://www.zenloadbalancer.com
> Distribution list (subscribe): [email protected]
>
> ------------------------------------------------------------------------------
> Android apps run on BlackBerry 10
> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> Now with support for Jelly Bean, Bluetooth, Mapview and more.
> Get your Android app in front of a whole new audience.  Start now.
> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
> _______________________________________________
> Zenloadbalancer-support mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/zenloadbalancer-support
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Zenloadbalancer-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/zenloadbalancer-support

Reply via email to