Thank you, Benjamin. So, I could periodically request the metrics endpoint, or stream the logs (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed" message look like in the logs?
Are there plans to offer a mechanism for event subscription? Cordially, Paul On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <benjamin.mah...@gmail.com> wrote: > You can detect when we remove an agent due to health check failures via > the metrics endpoint, but these are counters that are better used for > alerting / dashboards for visibility. If you need to know which agents, you > can also consume the logs as a stop-gap solution, until we offer a > mechanism for subscribing to cluster events. > > On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <arach...@gmail.com> wrote: > >> Hi All, >> >> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a >> subscribable event bus. >> >> So I am wondering if there's a best practices way of determining if a >> slave node has crashed. By "crashed" I mean something like the power plug >> got yanked, or anything that would cause Mesos to stop talking to the slave >> node. >> >> I suppose such information would be recorded in /var/log/mesos. >> >> Interested to learn how best to detect this. >> >> Thank you. >> >> -Paul >> > >