Re: Kafka Monitoring, 0.7 vs. 0.8 JMX

David DeMaagd Wed, 08 May 2013 15:07:29 -0700

I think there's really two angles to look at this from...

1) What is 'important' to monitor?  Meaning, what subset of these are
important/critical for being able to tell system health (things you want
to set alerts on), what subset are nice to have for overall health and 
capacity planning (things you want to create pretty graphs on) and the
rest (not immediately useful in general, but can really help in a
debugging/triage situation).

2) How do you get the data?  Kind of independent of the above, though 
kinda related as well.  

As for the second one, you need to look at the collection mechanics.  As
you mentioned below, large scale polling (especially with a non-trivial
number of beans) is expensive and problematic no matter how you do it
(JMX or HTTP) given enough scale.  I don't have much experience with the
codahale metrics route directly, but I have messed with Jolokia, which is
likely in the same boat - they expose the metrics for you to grab. In
both cases, given enough data points (and kafka, depending on the number
of topics involved, has a /lot/ of them), either can be slow if not
implemented carefully.  Meaning you may overrun your desired polling
interval.

In very large environments, I've found it very scalable to have either a
local poller on the box (which could be reading via JMX or HTTP) which
then emits the data to something or have some kind of wrapper around the 
application that does the collection/emission (launching the broker as a
thread, and the parent process dows some JMX magic to connect to the
data points).  Both of these routes depend a lot on your monitoring
infrastructure, but they will help you get around the general wide
polling problem...

Semi-shameless plug for how it is done at LinkedIn - 
http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection

-- 
Dave DeMaagd
ddema...@linkedin.com | 818 262 7958

(dragos.manole...@servicenow.com - Wed, May 08, 2013 at 09:27:21PM +0000)
> From the JmxReporter section of the metrics manual:
> 
> Warning
> We don¹t recommend that you try to gather metrics from your production
> environment. JMX¹s RPC API is fragile and bonkers. For development
> purposes and browsing, though, it can be very useful.
> 
> 
> 
> -Dragos
> 
> On 5/8/13 2:10 PM, "Otis Gospodnetic" <otis_gospodne...@yahoo.com> wrote:
> 
> >Also, do you recommend getting metrics via JMX or via HTTP?
>

Re: Kafka Monitoring, 0.7 vs. 0.8 JMX

Reply via email to