unsubscribe -------- Original Message -------- From: Jeff Schroeder <jeffschroe...@computer.org> Apparently from: user-return-2791-pinktie=safe-mail....@mesos.apache.org To: Mesos Users <user@mesos.apache.org> Subject: Question on Monitoring a Mesos Cluster Date: Sat, 7 Mar 2015 12:02:00 -0600
> I wrote a python collectd plugin which pulls both master (only if > master/elected == 1) and slave stats from the rest api under > /metrics/snapshot and /slave(1)/stats.json respectively and throws those into > graphite. > > After getting everything working, I built a few dashboards, one of which > displays these stats from http://master:5051/metrics/snapshot: > > master/disk_percent > master/cpus_percent > master/mem_percent > > I had assumed that this was something like aggregate cluster utilization, but > this seems incorrect in practice. I have a small cluster with ~1T of memory, > ~25T of Disks, and ~540 CPU cores. I had a dozen or so small tasks running, > and launched 500 tasks with 1G of memory and 1 CPU each. > > Now I'd expect to se the disk/cpu/mem percentage metrics above go up > considerably. I did notice that cpus_percent went to around 0.94. > > What is the correct way to measure overall cluster utilization for capacity > planning? We can have the NOC watch this and simply add more hardware when > the number starts getting low. > > Thanks > > -- > Jeff Schroeder > > Don't drink and derive, alcohol and analysis don't mix. > http://www.digitalprognosis.com > >