Re: Monitoring at container level

connor . p . d Thu, 07 Jul 2016 20:21:07 -0700

Small plug for snap (https://github.com/intelsdi-x/snap). It's a telemetry 
framework with a lot of useful plugins for collecting, processing and 
publishing metrics. There's a go API (and soon more langs) for writing your own 
plugins. Plugin catalog: 
https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_CATALOG.md


> On Jul 7, 2016, at 17:34, Guangya Liu <gyliu...@gmail.com> wrote:
> 
> Have you ever tried prometheus + Grafana? Please take a look at 
> https://prometheus.io/docs/visualization/grafana/ to see if it helps.
> 
>> On Fri, Jul 8, 2016 at 5:51 AM, David Kesler <dkes...@yodle.com> wrote:
>> We use a combination of new relic for application level monitoring and a 
>> custom python script that scrapes a bunch of stats from the docker socket 
>> file and throws them into elastic so we can use kibana to make graphs. 
>> 
>>  
>> 
>> From: Gregory Durham [mailto:gregory.dur...@gmail.com] 
>> Sent: Thursday, July 07, 2016 4:58 PM
>> To: user@mesos.apache.org
>> Cc: krishnan.k.i...@gmail.com; Michał Łowicki
>> Subject: Re: Monitoring at container level
>> 
>>  
>> 
>> I have been using datadog to monitor my infrastructure. The integration into 
>> service discovery has been really helpful for these environments. 
>> 
>>  
>> 
>> On Thu, Jul 7, 2016 at 1:37 PM, Steven Schlansker 
>> <sschlans...@opentable.com> wrote:
>> 
>> We use Graphite and ran into similar problems with huge metric namespaces.
>> We use the Singularity framework which provides both the task "request id" 
>> (name)
>> and "instance number" (0..N) to the task.
>> 
>> So we set our Graphite namespace to be "request-number" e.g. "myservice-3"
>> This has the downside of discontinuous data when you deploy a new release
>> but we haven't had too many issues due to that in practice.
>> 
>> 
>> 
>> > On Jul 7, 2016, at 1:26 PM, Krish <krishnan.k.i...@gmail.com> wrote:
>> >
>> > I have had a good experience so far with bosun and scollector with 
>> > cadvisor.
>> > Check it out at bosun.org.
>> >
>> >
>> > On Friday 8 July 2016, Pradeep Chhetri <pradeep.chhetr...@gmail.com> wrote:
>> > Hi Michal,
>> >
>> > Do have a look at sysdig (http://www.sysdig.org). It is basically an 
>> > open-source tool which provides container insights. Maybe your will find 
>> > something helpful over there.
>> >
>> > To tackle the case of new metrics for new containers, maybe you should tag 
>> > metrics by service-name instead of container id. (Graphite doesn't have 
>> > concept of tags but something like opentsdb and influxdb do have. I don't 
>> > see a reason to replace graphite for that. You can use your service-name 
>> > (which the container is representing) instead of hostname in the metrics 
>> > name)
>> >
>> > On Fri, Jul 8, 2016 at 1:18 AM, Michał Łowicki <mlowi...@gmail.com> wrote:
>> > Hi,
>> >
>> > Before introducing Mesos we're using mainly Graphite / Grafana. Ideally we 
>> > would like to have metrics per container as an easy way to detect if 
>> > problem touches only single, subset of containers or it's global.
>> >
>> > Unfortunately using Graphite for that is far from being perfect. Having 
>> > container identifier as a part of metric has many negative implications 
>> > like having tons of new metrics every release on Marathon (new containers 
>> > = new identifiers).
>> >
>> > Investigated InfluxDB so far but project isn't mature enough as still 
>> > components like 
>> > https://github.com/influxdata/telegraf/blob/master/plugins/inputs/statsd/README.md#influx-statsd
>> >  have major blockers:
>> >
>> > COMING SOON: there will be a way to specify multiple fields.
>> >
>> > What do you use to monitor your Mesos clusters and f.ex. to detect that 
>> > some containers are having issues?
>> >
>> > --
>> > BR,
>> > Michał Łowicki
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Pradeep Chhetri
>> >
>> >
>> > --
>> >
>> > Thumb typed mail
>> >
>> 
>

Re: Monitoring at container level

Reply via email to