unsubscribe

pinktie Sat, 07 Mar 2015 10:31:49 -0800

unsubscribe

-------- Original Message --------
From: Jeff Schroeder <jeffschroe...@computer.org>
Apparently from: user-return-2791-pinktie=safe-mail....@mesos.apache.org
To: Mesos Users <user@mesos.apache.org>
Subject: Question on Monitoring a Mesos Cluster
Date: Sat, 7 Mar 2015 12:02:00 -0600


> I wrote a python collectd plugin which pulls both master (only if 
> master/elected == 1) and slave stats from the rest api under 
> /metrics/snapshot and /slave(1)/stats.json respectively and throws those into 
> graphite.
> 
> After getting everything working, I built a few dashboards, one of which 
> displays these stats from http://master:5051/metrics/snapshot:
> 
> master/disk_percent
> master/cpus_percent
> master/mem_percent 
>  
> I had assumed that this was something like aggregate cluster utilization, but 
> this seems incorrect in practice. I have a small cluster with ~1T of memory, 
> ~25T of Disks, and ~540 CPU cores. I had a dozen or so small tasks running, 
> and launched 500 tasks with 1G of memory and 1 CPU each.
> 
> Now I'd expect to se the disk/cpu/mem percentage metrics above go up 
> considerably. I did notice that cpus_percent went to around 0.94.
>  
> What is the correct way to measure overall cluster utilization for capacity 
> planning? We can have the NOC watch this and simply add more hardware when 
> the number starts getting low.
> 
> Thanks
>  
> -- 
> Jeff Schroeder
> 
> Don't drink and derive, alcohol and analysis don't mix.
> http://www.digitalprognosis.com
> 
>

unsubscribe

Reply via email to