Hi,
I got stuck in using Prometheus,Pushgateway to collect metrics. Here is my
configuration about reporter:
metrics.reporter.promgateway.class:
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: localhost
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: myJob
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: true
And the version information:
Flink 1.9.1
Prometheus 2.18
PushGateway 1.2 & 0.9 (I had already try them both)
I found that when Flink cluster restart, there showed up metrics which have new
jobName with random suffix. But there still existed those metrics having
jobName before restarting cluster(value stop update). Since Prometheus still
periodically pulled the data in pushgateway, I got a bunch of time series data
with value unchanged forever.
It looks like:
# HELP flink_jobmanager_Status_JVM_CPU_Load Load (scope:
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Load gauge
flink_jobmanager_Status_JVM_CPU_Load{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"}
0
flink_jobmanager_Status_JVM_CPU_Load{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"}
0.0006602344673593189
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope:
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"}
4.54512e+09
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"}
8.24809e+09
# HELP flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded
(scope: jobmanager_Status_JVM_ClassLoader)
# TYPE flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded gauge
flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"}
5984
flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"}
6014
# HELP flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded
(scope: jobmanager_Status_JVM_ClassLoader)
# TYPE flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge
flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"}
0
flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"}
0
Ps: This cluster has one JobManager.
In my understanding, when I set metrics.reporter.promgateway.deleteOnShutdown
to true, the old metrics information should be deleted from pushgateway. But it
didn’t work somehow.
Is my understanding on these configuration right? Any solution about deleting
metrics from pushgateway?
Thanks!