Responses to your questions:

  1.  Did this work with the same setup before 1.3?

I have not tested it with another version. I started working on the metrics 
stuff with a snapshot of 1.3 and move to the release.


  1.  Are all task/operator metrics available in the metrics tab of the 
dashboard?

Yes, the metrics are seen from the dashboard.


  1.  Are there any warnings in the TaskManager logs from the MetricRegistry or 
StatsDReporter?

No, I am not seeing any errors in the logs related to metrics.


> My guess would be that the operator/task metrics contain characters that 
> either StatsD or telegraf don't allow,
which causes them to be dropped.

This was my original thought too. I did find two separate issues with the 
metrics Flink outputs and I was planning on filing JIRA tickets on these. They 
are:


-          Flink does not escape spaces. I had a space in the job name which 
messed up the metrics. So I have a workaround for this but it is probably 
something Flink should escape.

-          Flink is outputting a float value of “n/a” for 
lastCheckpointExternalPath. A guage needs to be a float so Telegraf does not 
like this. It errors on and continues ignoring it though.

Note that even with these accounted for I am still not seeing the task/operator 
metrics. I ran a tcpdump to be sure on exactly what is coming through. 
Searching through that dump, I don’t see any of the metrics I was looking for.

I guess a few things to note. This is the application I am running:
https://github.com/chrisdail/pravega-samples/blob/master/flink-examples/src/main/scala/io/pravega/examples/flink/iot/TurbineHeatProcessor.scala

Also, I am running this in DC/OS 1.9 trying to integrate with DC/OS metrics.

Thanks

Chris


From: Chesnay Schepler <ches...@apache.org>
Date: Tuesday, June 13, 2017 at 5:26 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Task and Operator Metrics in Flink 1.3

The scopes look OK to me.

Let's try to narrow down the problem areas a bit:

  1.  Did this work with the same setup before 1.3?
  2.  Are all task/operator metrics available in the metrics tab of the 
dashboard?
  3.  Are there any warnings in the TaskManager logs from the MetricRegistry or 
StatsDReporter?
My guess would be that the operator/task metrics contain characters that either 
StatsD or telegraf don't allow,
which causes them to be dropped.

On 12.06.2017 20:32, Dail, Christopher wrote:
I’m using the Flink 1.3.0 release and am not seeing all of the metrics that I 
would expect to see. I have flink configured to write out metrics via statsd 
and I am consuming this with telegraf. Initially I thought this was an issue 
with telegraf parsing the data generated. I dumped all of the metrics going 
into telegraf using tcpdump and found that there was a bunch of data missing 
that I expect.

I’m using this as a reference for what metrics I expect:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html

I see all of the JobManager and TaskManager level metrics. Things like 
Status.JVM.* are coming through. TaskManager Status.Network are there (but not 
Task level buffers). The ‘Cluster’ metrics are there.

This IO section contains task and operator level metrics (like what is 
available on the dashboard). I’m not seeing any of these metrics coming through 
when using statsd.

I’m configuring flink with this configuration:

metrics.reporters: statsd
metrics.reporter.statsd.class: org.apache.flink.metrics.statsd.StatsDReporter
metrics.reporter.statsd.host: hostname
metrics.reporter.statsd.port: 8125

# Customized Scopes
metrics.scope.jm: flink.jm
metrics.scope.jm.job: flink.jm.<job_name>
metrics.scope.tm: flink.tm.<tm_id>
metrics.scope.tm.job: flink.tm.<tm_id>.<job_name>
metrics.scope.task: flink.tm.<tm_id>.<job_name>.<task_name>.<subtask_index>
metrics.scope.operator: 
flink.tm.<tm_id>.<job_name>.<operator_name>.<subtask_index>

I have tried with and without specifically setting the metrics.scope values.

Is anyone else having similar issues with metrics in 1.3?

Thanks

Chris Dail
Director, Software Engineering
Dell EMC | Infrastructure Solutions Group
mobile +1 506 863 4675
christopher.d...@dell.com<mailto:christopher.d...@dell.com>





Reply via email to