Is there any warning in the logs containing "Error while handling metric"?

On 03/05/2022 10:18, ChangZhuo Chen (陳昌倬) wrote:
On Tue, May 03, 2022 at 01:00:42AM -0700, Mason Chen wrote:
Hi ChangZhou,

The warning log indicates that the metric was previously defined and so the
runtime is handling the "duplicate" metric by ignoring it. This is
typically a benign message unless you rely on this metric. Is it possible
that you are using the same task name for different tasks? It would be
defined by the `.name(...)` API in your job graph instantiation.

Can you clarify what it means that your endpoint isn't working--some
metrics missing, endpoint is timing out, etc.? Also, can you confirm from
logs that the PrometheusReporter was created properly?
Endpoint isn't working means we got empty reply from Prometheus
endpoint. The following is our testing for taskmanager Prometheus
endpoint.

     curl localhost:9249
     curl: (52) Empty reply from server

We have the following log in taskmanager, so PrometheusReporter was
created properly.

     2022-05-03 01:48:16,678 INFO  
org.apache.flink.configuration.GlobalConfiguration           [] - Loading 
configuration property: metrics.reporter.prom.class, 
org.apache.flink.metrics.prometheus.PrometheusReporter
     ...
     2022-05-03 01:48:23,665 INFO  
org.apache.flink.metrics.prometheus.PrometheusReporter       [] - Started 
PrometheusReporter HTTP server on port 9249.
     2022-05-03 01:48:23,669 INFO  
org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - Reporting 
metrics for reporter prom of type 
org.apache.flink.metrics.prometheus.PrometheusReporter.



Reply via email to