https://bugzilla.wikimedia.org/show_bug.cgi?id=72366
--- Comment #20 from Bryan Davis <bda...@wikimedia.org> --- When we talked with Emir from Facebook this spring he outlined how they handle restarts for each node: * Load balancer polls /status.php on each server to determine if it should receive requests * /status.php checks for a /hhvm/var/stopping file and if it exists returns a status to tell the LB not to send more requests == Stop == * touch /hhvm/var/stopping to trigger depool response to LB * wait up to 30s for requests to drain ** poll the built-in admin server to find current request load (/check-load) ** if >0: sleep 5s * send stop signal to server via built-in admin server's /stop command * send SIGTERM to hhvm processes via pkill * poll /status.php once a second for up to 30s looking for a connection failure * send SIGKILL to hhvm processes via pkill * remove /hhvm/var/stopping trigger file == Start == * remove /hhvm/var/stopping trigger file * warm OS cache of hhvm binary and hhbc cache via `cat >/dev/null` * start hhvm process * poll admin server once per second for up to 10s to see if it started ** `exit 1` if not seen * poll /status.php ** `exit 1` if you don't get a valid response == Restart == * stop * start See also <https://github.com/emiraga/hhvm-deploy-ideas/blob/master/2restart/2restart.sh> -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l