Hi! I also discovered problems with the PrometheusReporter on Flink 1.15.0, coming from 1.14.4. I already consulted the mailing list: https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc I have not found the underlying problem or a solution to it.
Actually, after re-checking, I see the same log WARNINGS as ChangZhou described. As I described, it seems to be an issue with my job. If no job, or an example job runs on the taskmanager the basic metrics work just fine. Maybe ChangZhou can confirm this? @ChangZhou what's your job setup? I am running a streaming SQL job, but also using data streams API to create the streaming environment and from that the table environment and finally using a StatementSet to execute multiple SQL statements in one job. @Mason, naming the operators with `.name(.)` is not possible using the table API. @Chesnay, in my case there are no error logs. Best & thanks, Peter On Tue, May 3, 2022 at 10:28 AM Chesnay Schepler <ches...@apache.org> wrote: > Is there any warning in the logs containing "Error while handling metric"? > > On 03/05/2022 10:18, ChangZhuo Chen (陳昌倬) wrote: > > On Tue, May 03, 2022 at 01:00:42AM -0700, Mason Chen wrote: > >> Hi ChangZhou, > >> > >> The warning log indicates that the metric was previously defined and so > the > >> runtime is handling the "duplicate" metric by ignoring it. This is > >> typically a benign message unless you rely on this metric. Is it > possible > >> that you are using the same task name for different tasks? It would be > >> defined by the `.name(...)` API in your job graph instantiation. > >> > >> Can you clarify what it means that your endpoint isn't working--some > >> metrics missing, endpoint is timing out, etc.? Also, can you confirm > from > >> logs that the PrometheusReporter was created properly? > > Endpoint isn't working means we got empty reply from Prometheus > > endpoint. The following is our testing for taskmanager Prometheus > > endpoint. > > > > curl localhost:9249 > > curl: (52) Empty reply from server > > > > We have the following log in taskmanager, so PrometheusReporter was > > created properly. > > > > 2022-05-03 01:48:16,678 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: metrics.reporter.prom.class, > org.apache.flink.metrics.prometheus.PrometheusReporter > > ... > > 2022-05-03 01:48:23,665 INFO > org.apache.flink.metrics.prometheus.PrometheusReporter [] - Started > PrometheusReporter HTTP server on port 9249. > > 2022-05-03 01:48:23,669 INFO > org.apache.flink.runtime.metrics.MetricRegistryImpl [] - Reporting > metrics for reporter prom of type > org.apache.flink.metrics.prometheus.PrometheusReporter. > > > > > >