Small plug for snap (https://github.com/intelsdi-x/snap). It's a telemetry framework with a lot of useful plugins for collecting, processing and publishing metrics. There's a go API (and soon more langs) for writing your own plugins. Plugin catalog: https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_CATALOG.md
> On Jul 7, 2016, at 17:34, Guangya Liu <gyliu...@gmail.com> wrote: > > Have you ever tried prometheus + Grafana? Please take a look at > https://prometheus.io/docs/visualization/grafana/ to see if it helps. > >> On Fri, Jul 8, 2016 at 5:51 AM, David Kesler <dkes...@yodle.com> wrote: >> We use a combination of new relic for application level monitoring and a >> custom python script that scrapes a bunch of stats from the docker socket >> file and throws them into elastic so we can use kibana to make graphs. >> >> >> >> From: Gregory Durham [mailto:gregory.dur...@gmail.com] >> Sent: Thursday, July 07, 2016 4:58 PM >> To: user@mesos.apache.org >> Cc: krishnan.k.i...@gmail.com; Michał Łowicki >> Subject: Re: Monitoring at container level >> >> >> >> I have been using datadog to monitor my infrastructure. The integration into >> service discovery has been really helpful for these environments. >> >> >> >> On Thu, Jul 7, 2016 at 1:37 PM, Steven Schlansker >> <sschlans...@opentable.com> wrote: >> >> We use Graphite and ran into similar problems with huge metric namespaces. >> We use the Singularity framework which provides both the task "request id" >> (name) >> and "instance number" (0..N) to the task. >> >> So we set our Graphite namespace to be "request-number" e.g. "myservice-3" >> This has the downside of discontinuous data when you deploy a new release >> but we haven't had too many issues due to that in practice. >> >> >> >> > On Jul 7, 2016, at 1:26 PM, Krish <krishnan.k.i...@gmail.com> wrote: >> > >> > I have had a good experience so far with bosun and scollector with >> > cadvisor. >> > Check it out at bosun.org. >> > >> > >> > On Friday 8 July 2016, Pradeep Chhetri <pradeep.chhetr...@gmail.com> wrote: >> > Hi Michal, >> > >> > Do have a look at sysdig (http://www.sysdig.org). It is basically an >> > open-source tool which provides container insights. Maybe your will find >> > something helpful over there. >> > >> > To tackle the case of new metrics for new containers, maybe you should tag >> > metrics by service-name instead of container id. (Graphite doesn't have >> > concept of tags but something like opentsdb and influxdb do have. I don't >> > see a reason to replace graphite for that. You can use your service-name >> > (which the container is representing) instead of hostname in the metrics >> > name) >> > >> > On Fri, Jul 8, 2016 at 1:18 AM, Michał Łowicki <mlowi...@gmail.com> wrote: >> > Hi, >> > >> > Before introducing Mesos we're using mainly Graphite / Grafana. Ideally we >> > would like to have metrics per container as an easy way to detect if >> > problem touches only single, subset of containers or it's global. >> > >> > Unfortunately using Graphite for that is far from being perfect. Having >> > container identifier as a part of metric has many negative implications >> > like having tons of new metrics every release on Marathon (new containers >> > = new identifiers). >> > >> > Investigated InfluxDB so far but project isn't mature enough as still >> > components like >> > https://github.com/influxdata/telegraf/blob/master/plugins/inputs/statsd/README.md#influx-statsd >> > have major blockers: >> > >> > COMING SOON: there will be a way to specify multiple fields. >> > >> > What do you use to monitor your Mesos clusters and f.ex. to detect that >> > some containers are having issues? >> > >> > -- >> > BR, >> > Michał Łowicki >> > >> > >> > >> > -- >> > Regards, >> > Pradeep Chhetri >> > >> > >> > -- >> > >> > Thumb typed mail >> > >> >