Re: terminating a failed post-start stanza

Steve Langasek Tue, 16 Apr 2013 15:31:46 -0700

Unfortunately the original mail doesn't seem to have made it to my mailbox,
so tacking on to this reply.

On Wed, Apr 17, 2013 at 01:31:00AM +0400, Alexander Petrov wrote:
> Hello Michael, hello devs
> Don't know why upstart's finite machine ignores errors in post-start

What's happening here is that the main process is started, upstart considers
it "running", and then launches the post-start script which then blocks
indefinitely; because the post-start script is blocking, upstart never has a
chance to handle the exit of the main process and trigger a respawn.

The best way to avoid this problem is to avoid polling in the post-start
script /at all/.

> 2) upstart kills all processes in process group so you can run postress in
> background and check its status

This definitely doesn't solve the problem of the post-start script never
finishing.

> 2013/4/16 Michael Barrett <[email protected]>

> > Hi, I'm working on converting my postgresql package to using upstart.
> >  While doing so, I found that postgres takes a few moments after starting
> > to actually start accepting connections.  I decided to use a post-start job
> > to test whether the postgres server was up and available before letting
> > upstart believe the service was ready.

> > Currently my upstart job looks like this: http://dpaste.com/1060864/

Essentially, your post-start script is a workaround for postgres not using
any of the "standard" ways of indicating service readiness that upstart
supports natively.  That's unfortunate; it would be nice if postgresql
supported daemonization, but we probably don't want to patch upstream to
implement this.

So a post-start script is a reasonable workaround - the main problem is that
the script you're using here never terminates if the main script exits
without ever successfully answering on the socket.  Alexander's proposal
suffers from the same problem.  What you want to do here is make sure you
detect when the upstart job's target has changed from start to stop.

So this should do the trick (untested):

post-start script
    while ! su -c "psql -c 'select 1'" postgres
    do
        status | grep -q 'stop/' && exit 1
        sleep 1
    done
end script

Note that if the main process "starts" and continues running but never
accepts connections, the post-start script will still hang.  This is
probably ok; at least you'll be able to manually stop the job if necessary.

> > The issue I'm running into now is that whenever there is an issue with
> > the postgres server that causes it to crash, the post-start job hangs
> > indefinitely (as you'd expect from the loop).  I can't stop the job or
> > restart it once I fix the issue.  The only way I've been able to fix it
> > is to bring up postgres manually, which allows the post-start to finish.

> > I've tried putting a maximum # of retries in the post-start script, then
> > doing an exit 1 when it exceeds that amount, but that results in the
> > 'start postgresql' command exiting successfully, which causes issues in
> > other places in my application (because postgres isn't actually
> > running).

The 'start' command is defined as returning success if the *dbus request*
succeeds.  For a tool that provides the interface you're after, see
'service': i.e., 'service postgresql start' will return 0 if the service
is started successfully, and non-zero if not.

(FWIW, 'service' is not part of upstart but is a standard interface provided
on Ubuntu, with an interface derived from a tool of the same name on Red
Hat.)

Hope that helps,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
[email protected]                                     [email protected]

signature.asc
Description: Digital signature

-- 
upstart-devel mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/upstart-devel

Re: terminating a failed post-start stanza

Reply via email to