[Bug 72366] HHVM fcgi restart during scap runs cause 503s (and failed tests)

bugzilla-daemon Thu, 30 Oct 2014 08:46:36 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=72366


--- Comment #20 from Bryan Davis <bda...@wikimedia.org> ---
When we talked with Emir from Facebook this spring he outlined how they handle
restarts for each node:

* Load balancer polls /status.php on each server to determine if it should
receive requests
* /status.php checks for a /hhvm/var/stopping file and if it exists returns a
status to tell the LB not to send more requests

== Stop ==
* touch /hhvm/var/stopping to trigger depool response to LB
* wait up to 30s for requests to drain
** poll the built-in admin server to find current request load (/check-load)
** if >0: sleep 5s
* send stop signal to server via built-in admin server's /stop command
* send SIGTERM to hhvm processes via pkill
* poll /status.php once a second for up to 30s looking for a connection failure
* send SIGKILL to hhvm processes via pkill
* remove /hhvm/var/stopping trigger file

== Start ==
* remove /hhvm/var/stopping trigger file
* warm OS cache of hhvm binary and hhbc cache via `cat >/dev/null`
* start hhvm process
* poll admin server once per second for up to 10s to see if it started
** `exit 1` if not seen
* poll /status.php
** `exit 1` if you don't get a valid response

== Restart ==
* stop
* start

See also
<https://github.com/emiraga/hhvm-deploy-ideas/blob/master/2restart/2restart.sh>

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 72366] HHVM fcgi restart during scap runs cause 503s (and failed tests)

Reply via email to