Hi Chesnay, Thanks for the code snipped. Which trace logs are interesting? Of " org.apache.flink.metrics.prometheus.PrometheusReporter"? I could also add this logger settings in the environment where the problem is present.
Other than that, I am not sure how to reproduce this issue in a local setup. In the cluster where the metrics are missing I am navigating to the certain taskmanager and try to access the metrics via the configured prometheus port. When running a local flink (start-cluster.sh), I do not have a certain url/port to access the taskmanager, right? I noticed that my config of the PrometheusReporter is different here. I have: `metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter`. I will investigate if this is a problem. Unfortunately I can not provide my job at the moment. It contains business logic and it is tightly coupled with our Kafka systems. I will check the option of creating a sample job to reproduce the problem. Best, Peter On Tue, May 3, 2022 at 12:48 PM Chesnay Schepler <ches...@apache.org> wrote: > You'd help me out greatly if you could provide me with a sample job that > runs into the issue. > > So far I wasn't able to reproduce the issue, > but it should be clear that there is some given 3 separate reports, > although it is strange that so far it was only reported for Prometheus. > > If one of you is able to reproduce the issue within a Test and is feeling > adventurous, > then you might be able to get more information by forwarding the > java.util.logging > to SLF4J. Below is some code to get you started. > > DebuggingTest.java: > > class DebuggingTest { > > static { > LogManager.getLogManager().getLogger("").setLevel(Level.FINEST); > SLF4JBridgeHandler.removeHandlersForRootLogger(); > SLF4JBridgeHandler.install(); > miniClusterExtension = > new MiniClusterExtension( > new MiniClusterResourceConfiguration.Builder() > .setConfiguration(getConfiguration()) > .setNumberSlotsPerTaskManager(1) > .build()); > } > > @RegisterExtension private static final MiniClusterExtension > miniClusterExtension; > > private static Configuration getConfiguration() { > final Configuration configuration = new Configuration(); > > configuration.setString( > "metrics.reporter.prom.factory.class", > PrometheusReporterFactory.class.getName()); > configuration.setString("metrics.reporter.prom.port", "9200-9300"); > > return configuration; > } > > @Test > void runJob() throws Exception { > <run job> > } > } > > > pom.xml: > > <dependency> > <groupId>org.slf4j</groupId> > <artifactId>jul-to-slf4j</artifactId> > <version>1.7.32</version> > </dependency> > > log4j2-test.properties: > > rootLogger.level = off > rootLogger.appenderRef.test.ref = TestLogger > logger.http.name = com.sun.net.httpserver > logger.http.level = trace > appender.testlogger.name = TestLogger > appender.testlogger.type = CONSOLE > appender.testlogger.target = SYSTEM_ERR > appender.testlogger.layout.type = PatternLayout > appender.testlogger.layout.pattern = %-4r [%t] %-5p %c %x - %m%n > > On 03/05/2022 10:41, ChangZhuo Chen (陳昌倬) wrote: > > On Tue, May 03, 2022 at 10:32:03AM +0200, Peter Schrott wrote: > > Hi! > > I also discovered problems with the PrometheusReporter on Flink 1.15.0, > coming from 1.14.4. I already consulted the mailing > list:https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc > I have not found the underlying problem or a solution to it. > > Actually, after re-checking, I see the same log WARNINGS as > ChangZhou described. > > As I described, it seems to be an issue with my job. If no job, or an > example job runs on the taskmanager the basic metrics work just fine. Maybe > ChangZhou can confirm this? > > @ChangZhou what's your job setup? I am running a streaming SQL job, but > also using data streams API to create the streaming environment and from > that the table environment and finally using a StatementSet to execute > multiple SQL statements in one job. > > We are running a streaming application with low level API with > Kubernetes operator FlinkDeployment. > > > > >