(I had a problem and solved it during the course of writing this up, but
decided to share it nonetheless.
The short answer - skip to the bottom - is that the process called by
supervisor needs to test for the IP and sleep until it is ready.)

======
I'm using supervisor to launch some django_gunicorn processes.

I'd like these to bind to a VPN IP.

However, I've found that upon boot, this will fail because supervisor is
trying to launch them before the VPN is up and its IP established.

Let's say I'm trying to bind to a VPN IP of 10.15.20.10 (this IP is bound
to the localhost on which everything is running)

The relevant part of my launcher process will look like this:

exec /usr/local/bin/gunicorn_django -w $NUM_WORKERS \
    --user=$USER --group=$GROUP --log-level=debug \
    --log-file=$LOGFILE 2>>$LOGFILE \
    -b 10.15.20.10:8117

This will work fine if I do it manually once the system is running, and the
VPN IP is up:

# supervisorctl start example
example: started
# supervisorctl status example
example                          RUNNING    pid 6578, uptime 0:00:13
# cat /var/log/gunicorn/example.log
2012-12-26 13:14:41 [6578] [INFO] Starting gunicorn 0.16.1
2012-12-26 13:14:41 [6578] [DEBUG] Arbiter booted
2012-12-26 13:14:41 [6578] [INFO] Listening at: http://10.15.20.10:9999(6578)
2012-12-26 13:14:41 [6578] [INFO] Using worker: sync
2012-12-26 13:14:41 [6584] [INFO] Booting worker with pid: 6584
2012-12-26 13:14:41 [6585] [INFO] Booting worker with pid: 6585
2012-12-26 13:14:41 [6586] [INFO] Booting worker with pid: 6586

So, now I'l reboot the system and show the difference:

# supervisorctl status example
example                          FATAL      Exited too quickly (process log
may have details)
# cat /var/log/gunicorn/example.log
2012-12-26 13:20:48 [2823] [INFO] Starting gunicorn 0.16.1
2012-12-26 13:20:48 [2823] [ERROR] Invalid address: ('10.15.20.10', 9999)
2012-12-26 13:20:50 [3247] [INFO] Starting gunicorn 0.16.1
2012-12-26 13:20:50 [3247] [ERROR] Invalid address: ('10.15.20.10', 9999)
2012-12-26 13:20:52 [3782] [INFO] Starting gunicorn 0.16.1
2012-12-26 13:20:52 [3782] [ERROR] Invalid address: ('10.15.20.10', 9999)
2012-12-26 13:20:56 [4049] [INFO] Starting gunicorn 0.16.1
2012-12-26 13:20:56 [4049] [ERROR] Invalid address: ('10.15.20.10', 9999)

When I look at the openvpn.log file I see:

# more openvpn.log
Wed Dec 26 13:20:47 2012 OpenVPN 2.2.1 x86_64-linux-gnu [SSL] [LZO2]
[EPOLL] [PKCS11] [eurephia] [MH] [PF_INET6] [IPv6 payload 20110424-2
(2.2RC2)
] built on Mar 23 2012
...
...
Wed Dec 26 13:21:21 2012 /sbin/ifconfig tun0 10.15.20.10 pointopoint
10.15.20.1 mtu 1500
Wed Dec 26 13:21:21 2012 /sbin/route add -net 10.15.20.0 netmask
255.255.255.0 gw 10.15.20.1
Wed Dec 26 13:21:21 2012 Initialization Sequence Completed

Looking at the time stamps, the problem is pretty obvious:   the VPN
interface wasn't established until 25 seconds after supervisor tried and
failed to launch the gunicorn process.

So, I take a look at /etc/rc2.d:

# ls /etc/rc2.d | egrep 'supervisor|openvpn'
S17openvpn
S18supervisor

Actually, supervisor was originally started as S16supervisor, I renamed it
to have it start after openvpn.
But that isn't the real problem.  Looking at the openvpn log, it started at
13:20:47 while supervisor started a second later, 13:20:48.
However, it took until 13:21:21 for the VPN IP to come up, by which time of
course, the gunicorn_launcher process had failed and supervisor given up,
back at 13:20:56

I think to myself,  'So, what I really need is for supervisor's init script
to test for the VPN IP and wait until it's up before proceeding to try to
launch anything.'

I stick a 'sleep 60' into supervisor's init.d script, and that does solve
the problem.

Howerver, having done that, I realize it's not really a supervisor issue
and should be dealt with in the gunicorn launcher script.

So I modify the script to include a loop to ping the VPN IP and not proceed
to launch the gunicorn process until the IP is up:

 # don't run the gunicorn process until the VPN IP is up

  while true
   do
    ping -c 1 10.15.20.10
    if [[ $? == 0 ]];
    then
     break;
    else
     sleep 5
    fi
  done
  exec /usr/local/bin/gunicorn_django -w $NUM_WORKERS \
    --user=$USER --group=$GROUP --log-level=debug \
    --log-file=$LOGFILE 2>>$LOGFILE \
    -b 10.15.20.10:9999


#reboot
# supervisorctl status example
example                          RUNNING    pid 5019, uptime 0:00:14


Having written this up, and essentially solving the problem myself, I
decided to nevertheless go ahead and post it anyway.

I should probably share it with the gunicorn folks.



-S
_______________________________________________
Supervisor-users mailing list
[email protected]
https://lists.agendaless.com/mailman/listinfo/supervisor-users

Reply via email to