Re: MESOS-695 / Automated self-healing and coordinated repair to Mesos

2014-05-19 Thread Vinod Kone
Monit usage for Mesos @Twitter is very basic. For both master and slave, monit pings a known endpoint (/health or /stats.json) and restarts the process if it fails to respond within a timeout (with retries). The motivation for self-healing is that it is co-ordinated via master (as you alluded to).

MESOS-695 / Automated self-healing and coordinated repair to Mesos

2014-05-16 Thread Tom Arnfeld
Hi all, Wasn’t sure if it was right to start this thread on the JIRA issue.. I just came across MESOS-695 (and what seems to be something almost finished!) about implementing some kind of self-healing mechanism in mesos, and also picked up on mentions of monit. From what I could tell based on t

Re: MESOS-695 / Automated self-healing and coordinated repair to Mesos

2014-05-16 Thread Jeff Currier
+Charlie Tom, Charlie is heading up this work now so he can likely better speak to what's taking place on this ticket then I can at the moment. --Jeff-- On Fri, May 16, 2014 at 9:05 AM, Tom Arnfeld wrote: > Hi all, > > Wasn’t sure if it was right to start this thread on the JIRA issue.. I > j