Re: Prometheus metrics does not work in 1.15.0 taskmanager

Peter Schrott Tue, 03 May 2022 05:12:19 -0700

Hi Chesnay,

Thanks for the code snipped. Which trace logs are interesting? Of "
org.apache.flink.metrics.prometheus.PrometheusReporter"?
I could also add this logger settings in the environment where the problem
is present.


Other than that, I am not sure how to reproduce this issue in a local
setup. In the cluster where the metrics are missing I am navigating to the
certain taskmanager and try to access the metrics via the configured
prometheus port. When running a local flink (start-cluster.sh), I do not
have a certain url/port to access the taskmanager, right?

I noticed that my config of the PrometheusReporter is different here. I
have: `metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter`. I will investigate
if this is a problem.

Unfortunately I can not provide my job at the moment. It
contains business logic and it is tightly coupled with our Kafka systems. I
will check the option of creating a sample job to reproduce the problem.

Best, Peter

On Tue, May 3, 2022 at 12:48 PM Chesnay Schepler <ches...@apache.org> wrote:

> You'd help me out greatly if you could provide me with a sample job that
> runs into the issue.
>
> So far I wasn't able to reproduce the issue,
> but it should be clear that there is some given 3 separate reports,
> although it is strange that so far it was only reported for Prometheus.
>
> If one of you is able to reproduce the issue within a Test and is feeling
> adventurous,
> then you might be able to get more information by forwarding the
> java.util.logging
> to SLF4J. Below is some code to get you started.
>
> DebuggingTest.java:
>
> class DebuggingTest {
>
>     static {
>         LogManager.getLogManager().getLogger("").setLevel(Level.FINEST);
>         SLF4JBridgeHandler.removeHandlersForRootLogger();
>         SLF4JBridgeHandler.install();
>         miniClusterExtension =
>                 new MiniClusterExtension(
>                         new MiniClusterResourceConfiguration.Builder()
>                                 .setConfiguration(getConfiguration())
>                                 .setNumberSlotsPerTaskManager(1)
>                                 .build());
>     }
>
>     @RegisterExtension private static final MiniClusterExtension 
> miniClusterExtension;
>
>     private static Configuration getConfiguration() {
>         final Configuration configuration = new Configuration();
>
>         configuration.setString(
>                 "metrics.reporter.prom.factory.class", 
> PrometheusReporterFactory.class.getName());
>         configuration.setString("metrics.reporter.prom.port", "9200-9300");
>
>         return configuration;
>     }
>
>     @Test
>     void runJob() throws Exception {
>         <run job>
>     }
> }
>
>
> pom.xml:
>
> <dependency>
>    <groupId>org.slf4j</groupId>
>    <artifactId>jul-to-slf4j</artifactId>
>    <version>1.7.32</version>
> </dependency>
>
> log4j2-test.properties:
>
> rootLogger.level = off
> rootLogger.appenderRef.test.ref = TestLogger
> logger.http.name = com.sun.net.httpserver
> logger.http.level = trace
> appender.testlogger.name = TestLogger
> appender.testlogger.type = CONSOLE
> appender.testlogger.target = SYSTEM_ERR
> appender.testlogger.layout.type = PatternLayout
> appender.testlogger.layout.pattern = %-4r [%t] %-5p %c %x - %m%n
>
> On 03/05/2022 10:41, ChangZhuo Chen (陳昌倬) wrote:
>
> On Tue, May 03, 2022 at 10:32:03AM +0200, Peter Schrott wrote:
>
> Hi!
>
> I also discovered problems with the PrometheusReporter on Flink 1.15.0,
> coming from 1.14.4. I already consulted the mailing 
> list:https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc
> I have not found the underlying problem or a solution to it.
>
> Actually, after re-checking, I see the same log WARNINGS as
> ChangZhou described.
>
> As I described, it seems to be an issue with my job. If no job, or an
> example job runs on the taskmanager the basic metrics work just fine. Maybe
> ChangZhou can confirm this?
>
> @ChangZhou what's your job setup? I am running a streaming SQL job, but
> also using data streams API to create the streaming environment and from
> that the table environment and finally using a StatementSet to execute
> multiple SQL statements in one job.
>
> We are running a streaming application with low level API with
> Kubernetes operator FlinkDeployment.
>
>
>
>
>

Re: Prometheus metrics does not work in 1.15.0 taskmanager

Reply via email to