Hi!

I also discovered problems with the PrometheusReporter on Flink 1.15.0,
coming from 1.14.4. I already consulted the mailing list:
https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc
I have not found the underlying problem or a solution to it.

Actually, after re-checking, I see the same log WARNINGS as
ChangZhou described.

As I described, it seems to be an issue with my job. If no job, or an
example job runs on the taskmanager the basic metrics work just fine. Maybe
ChangZhou can confirm this?

@ChangZhou what's your job setup? I am running a streaming SQL job, but
also using data streams API to create the streaming environment and from
that the table environment and finally using a StatementSet to execute
multiple SQL statements in one job.

@Mason, naming the operators with `.name(.)` is not possible using the
table API.

@Chesnay, in my case there are no error logs.

Best & thanks,
Peter

On Tue, May 3, 2022 at 10:28 AM Chesnay Schepler <ches...@apache.org> wrote:

> Is there any warning in the logs containing "Error while handling metric"?
>
> On 03/05/2022 10:18, ChangZhuo Chen (陳昌倬) wrote:
> > On Tue, May 03, 2022 at 01:00:42AM -0700, Mason Chen wrote:
> >> Hi ChangZhou,
> >>
> >> The warning log indicates that the metric was previously defined and so
> the
> >> runtime is handling the "duplicate" metric by ignoring it. This is
> >> typically a benign message unless you rely on this metric. Is it
> possible
> >> that you are using the same task name for different tasks? It would be
> >> defined by the `.name(...)` API in your job graph instantiation.
> >>
> >> Can you clarify what it means that your endpoint isn't working--some
> >> metrics missing, endpoint is timing out, etc.? Also, can you confirm
> from
> >> logs that the PrometheusReporter was created properly?
> > Endpoint isn't working means we got empty reply from Prometheus
> > endpoint. The following is our testing for taskmanager Prometheus
> > endpoint.
> >
> >      curl localhost:9249
> >      curl: (52) Empty reply from server
> >
> > We have the following log in taskmanager, so PrometheusReporter was
> > created properly.
> >
> >      2022-05-03 01:48:16,678 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.reporter.prom.class,
> org.apache.flink.metrics.prometheus.PrometheusReporter
> >      ...
> >      2022-05-03 01:48:23,665 INFO
> org.apache.flink.metrics.prometheus.PrometheusReporter       [] - Started
> PrometheusReporter HTTP server on port 9249.
> >      2022-05-03 01:48:23,669 INFO
> org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - Reporting
> metrics for reporter prom of type
> org.apache.flink.metrics.prometheus.PrometheusReporter.
> >
> >
>
>

Reply via email to