Some more details on my problem:

1. The "Multiple implementations" problem was because I had the
metrics-prometheus jar both in the plugins and lib directories. I
tried putting it in only one,
and in both cases (plugins or lib), the result was the same, I got
only Flink metrics on my prom port.
2. My application extends
https://github.com/twitter/twitter-server/blob/develop/server/src/main/scala/com/twitter/server/TwitterServer.scala
and I was sending
my custom stats via the statsReceiver provided there
https://github.com/twitter/twitter-server/blob/33b3fb00635c4ab1102eb0c062cde6bb132d80c0/server/src/main/scala/com/twitter/server/Stats.scala#L14
3. I realized that my reporter configuration was:

metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9999

So I guess in 1.16.2 the prometheus reporter could have been
instantiated by class name, and perhaps that somehow allowed my
metrics to be merged with the Flink
ones, but in 1.17.1 the reporter gets instantiated by the factory and
somehow that renders my metrics invisible. Do you have any suggestion
so my metrics work as in 1.16.2?

Thanks again, Javier Vegas


El mar, 26 sept 2023 a las 19:42, Javier Vegas (<jve...@strava.com>) escribió:
>
> I implemented some custom Prometheus metrics that were working on
> 1.16.2, with my configuration
>
> metrics.reporter.prom.factory.class:
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> metrics.reporter.prom.port: 9999
>
> I could see both Flink metrics and my custom metrics on port 9999 of
> my task managers
>
> After upgrading to 1.17.1, using the same configuration, I can see
> only the FLink metrics on port 9999 of the task managers,
> the custom metrics are getting lost somewhere.
>
> The release notes for 1.17 mention
> https://issues.apache.org/jira/browse/FLINK-24235
> that removes instantiating reporters by name and forces using a
> factory, which I was already doing in 1.16.2. Do I need to do
> anything extra after those changes so my metrics are aggregated with
> the Flink ones?
>
> I am also seeing this error message on application startup (which I
> was already seeing in 1.16.2): "Multiple implementations of the same
> reporter were found in 'lib' and/or 'plugins' directories for
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
> recommended to remove redundant reporter JARs to resolve used
> versions' ambiguity." Could that also explain the missing metrics?
>
> Thanks,
>
> Javier Vegas

Reply via email to