More info....
I was doing some "stress-testing" and interestingly, the Metrics Collector
crashed 2 times and I had to restart it (don't like a file-based HBase for the
metrics collector, but not very confident of configuring the system to point to
an existing HBase cluster).
Also, after this email thread, I looked up the metrics collector logs and see
errors like this -
METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393013:09:37,619 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129
- Call exception, tries=11, retries=35, started=835564 ms ago, cancelled=false,
msg=row
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393113:10:58,082 INFO [phoenix-1-thread-349920] RpcRetryingCaller:129
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false,
msg=row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393013:10:58,082 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false,
msg=row
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393113:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 -
Exception during aggregating
metrics.org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36,
exceptions:Sat Apr 25 13:10:58 UTC 2015, null, java.net.SocketTimeoutException:
callTimeout=900000, callDuration=938097: row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930
at
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107)
at
org.apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527)
at
org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
at
org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:63)
at
org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90)
at
org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
at
org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromResultSet(TimelineMetricAggregator.java:104)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTimelineAggregator.java:217)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70)
From: Jayesh Thakrar <[email protected]>
To: Siddharth Wagle <[email protected]>; "[email protected]"
<[email protected]>
Sent: Wednesday, May 6, 2015 10:07 PM
Subject: Re: Kafka broker metrics not appearing in REST API
Hi Siddharth,
Yes, I am using Ambari 2.0 with Ambari Metrics service.The interesting thing is
that I got them for some time and not anymore.And I also know that the metrics
are being collected since i can see them on the dashboard.Any pointer for
troubleshooting?
And btw, it would be nice to have a count of messages received and not a
computed metric count / min.TSDB does a good job of giving me cumulative and
rate-per-sec graphs and numbers.
Thanks in advance,Jayesh
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>; Jayesh Thakrar
<[email protected]>
Sent: Wednesday, May 6, 2015 10:03 PM
Subject: Re: Kafka broker metrics not appearing in REST API
#yiv1951040747 --P{margin-top:0;margin-bottom:0;}#yiv1951040747 Hi Jayesh,
Are you using Ambari 2.0 with Ambari Metrics service?
BR,Sid
From: Jayesh Thakrar <[email protected]>
Sent: Wednesday, May 06, 2015 7:53 PM
To: [email protected]
Subject: Kafka broker metrics not appearing in REST API Hi,
I have installed 2 clusters with Ambari and Storm and Kafka.After the install,
I was able to get metrics for both Storm and Kafka via REST API.This worked
fine for a week, but since the past 2 days, I have not been getting Kafka
metrics.
I need the metrics to push to an OpenTSDB cluster.I do get host metrics and
Nimbus metrics but not KAFKA_BROKER metrics.
I did have maintenance turned on for some time, but maintenance is turned off
now.
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "NIMBUS", "service_name" : "STORM" }, "metrics" : {
"storm" : { "nimbus" : { "freeslots" : 54.0, "supervisors" :
27.0, "topologies" : 0.0, "totalexecutors" : 0.0,
"totalslots" : 54.0, "totaltasks" : 0.0, "usedslots" : 0.0 }
} }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "KAFKA_BROKER", "service_name" : "KAFKA" }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "SUPERVISOR", "service_name" : "STORM" }