[ANN] new crashed process handling

Adam Wiggins Fri, 27 Aug 2010 15:59:12 -0700

We've just rolled out a new version of our process manager, with streamlined
handling of crashed processes.  A lot of you guys have been frustrated when
your workers crash (libXML segfaults, for example) because the restart
behavior was somewhat unpredictable.


The new version follows a very simple algorithm: any process that crashes
will be restarted immediately, with a cooldown time of ten minutes.  This
applies to all process types including web processes (dynos).

We've also unified the behavior of crashed dynos with crashed workers.  Now
if you push up a bad deploy (missing a gem, etc) you won't be shown the
stack trace in the browser, and instead can find the stack trace and
debugging info by running "heroku logs".  This makes the workflow for
handling a crashed process the same as handling an exception during the
request cycle (HTTP 500): your users are displayed a generic page, and you
the developer can look up the problem in your logs.

A description of the new crash policy, and some documentation for the
"heroku ps" command (great for introspecting on the state of your processes)
can be found here:

http://docs.heroku.com/ps

We welcome your feedback.  Please reply (here on the mailing list, or to me
privately) if you have any thoughts, especially if you've seen it in action
on your own crashed processes.

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

[ANN] new crashed process handling

Reply via email to