Hello hadoop experts!

I'm looking for a way to gather all the metrics and counters of individual
jobs as well as the whole cluster in the event-driven way to store all this
data within elasticsearch for later troubleshooting and analysis.

Using metric exporters seems to be the right way to follow, but metrics
does not allow to gather individual job counters. Although there are
container
(org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics)
and mr app metric sources
(org.apache.hadoop.mapreduce.v2.app.metrics.MRAppMetrics) these ones don't
allow to obtain individual job counters.

I also looked at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster#dispatcher's handlers to
register custom org.apache.hadoop.yarn.event.EventHandler for all the
events of interest, but it seems to be hardly possible and not pluggable
right now.

The pretty simple solution I've found is implementing an application that
will poll job history server using its rest api. This way is pretty
straightforward, but it does not allow to gather  changes in an
event-driven way.

So, I'd like to ask whether there is any way to collect metrics and
counters of individual hadoop jobs in an event-driven fashion without
involving black magic of javaagents (java instrumentation api), bytecode
modifications and aop-like functionality to intercept all the executions of
org.apache.hadoop.yarn.event.EventHandler#handle(Event).

Reply via email to