Hello hadoop experts! I'm looking for a way to gather all the metrics and counters of individual jobs as well as the whole cluster in the event-driven way to store all this data within elasticsearch for later troubleshooting and analysis.
Using metric exporters seems to be the right way to follow, but metrics does not allow to gather individual job counters. Although there are container (org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics) and mr app metric sources (org.apache.hadoop.mapreduce.v2.app.metrics.MRAppMetrics) these ones don't allow to obtain individual job counters. I also looked at org.apache.hadoop.mapreduce.v2.app.MRAppMaster#dispatcher's handlers to register custom org.apache.hadoop.yarn.event.EventHandler for all the events of interest, but it seems to be hardly possible and not pluggable right now. The pretty simple solution I've found is implementing an application that will poll job history server using its rest api. This way is pretty straightforward, but it does not allow to gather changes in an event-driven way. So, I'd like to ask whether there is any way to collect metrics and counters of individual hadoop jobs in an event-driven fashion without involving black magic of javaagents (java instrumentation api), bytecode modifications and aop-like functionality to intercept all the executions of org.apache.hadoop.yarn.event.EventHandler#handle(Event).