We use a combination of new relic for application level monitoring and a custom 
python script that scrapes a bunch of stats from the docker socket file and 
throws them into elastic so we can use kibana to make graphs.

From: Gregory Durham [mailto:gregory.dur...@gmail.com]
Sent: Thursday, July 07, 2016 4:58 PM
To: user@mesos.apache.org
Cc: krishnan.k.i...@gmail.com; Michał Łowicki
Subject: Re: Monitoring at container level

I have been using datadog to monitor my infrastructure. The integration into 
service discovery has been really helpful for these environments.

On Thu, Jul 7, 2016 at 1:37 PM, Steven Schlansker 
<sschlans...@opentable.com<mailto:sschlans...@opentable.com>> wrote:
We use Graphite and ran into similar problems with huge metric namespaces.
We use the Singularity framework which provides both the task "request id" 
(name)
and "instance number" (0..N) to the task.

So we set our Graphite namespace to be "request-number" e.g. "myservice-3"
This has the downside of discontinuous data when you deploy a new release
but we haven't had too many issues due to that in practice.


> On Jul 7, 2016, at 1:26 PM, Krish 
> <krishnan.k.i...@gmail.com<mailto:krishnan.k.i...@gmail.com>> wrote:
>
> I have had a good experience so far with bosun and scollector with cadvisor.
> Check it out at bosun.org<http://bosun.org>.
>
>
> On Friday 8 July 2016, Pradeep Chhetri 
> <pradeep.chhetr...@gmail.com<mailto:pradeep.chhetr...@gmail.com>> wrote:
> Hi Michal,
>
> Do have a look at sysdig (http://www.sysdig.org). It is basically an 
> open-source tool which provides container insights. Maybe your will find 
> something helpful over there.
>
> To tackle the case of new metrics for new containers, maybe you should tag 
> metrics by service-name instead of container id. (Graphite doesn't have 
> concept of tags but something like opentsdb and influxdb do have. I don't see 
> a reason to replace graphite for that. You can use your service-name (which 
> the container is representing) instead of hostname in the metrics name)
>
> On Fri, Jul 8, 2016 at 1:18 AM, Michał Łowicki 
> <mlowi...@gmail.com<mailto:mlowi...@gmail.com>> wrote:
> Hi,
>
> Before introducing Mesos we're using mainly Graphite / Grafana. Ideally we 
> would like to have metrics per container as an easy way to detect if problem 
> touches only single, subset of containers or it's global.
>
> Unfortunately using Graphite for that is far from being perfect. Having 
> container identifier as a part of metric has many negative implications like 
> having tons of new metrics every release on Marathon (new containers = new 
> identifiers).
>
> Investigated InfluxDB so far but project isn't mature enough as still 
> components like 
> https://github.com/influxdata/telegraf/blob/master/plugins/inputs/statsd/README.md#influx-statsd
>  have major blockers:
>
> COMING SOON: there will be a way to specify multiple fields.
>
> What do you use to monitor your Mesos clusters and f.ex. to detect that some 
> containers are having issues?
>
> --
> BR,
> Michał Łowicki
>
>
>
> --
> Regards,
> Pradeep Chhetri
>
>
> --
>
> Thumb typed mail
>

Reply via email to