Yes. On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <ches...@apache.org> wrote:
> Are you running Flink is WSL by chance? > > On 06/07/2020 19:06, Manish G wrote: > > In flink-conf.yaml: > *metrics.reporter.prom.port: 9250-9260* > > This is based on information provided here > <https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter> > *port - (optional) the port the Prometheus exporter listens on, defaults > to 9249 > <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>. > In order to be able to run several instances of the reporter on one host > (e.g. when one TaskManager is colocated with the JobManager) it is > advisable to use a port range like 9250-9260.* > > As I am running flink locally, so both jobmanager and taskmanager are > colocated. > > In prometheus.yml: > > > > > *- job_name: 'flinkprometheus' scrape_interval: 5s static_configs: > - targets: ['localhost:9250', 'localhost:9251'] metrics_path: /* > > This is the whole configuration I have done based on several tutorials and > blogs available online. > > > > > On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> These are all JobManager metrics; have you configured prometheus to also >> scrape the task manager processes? >> >> On 06/07/2020 18:35, Manish G wrote: >> >> The metrics I see on prometheus is like: >> >> # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp >> lastCheckpointRestoreTimestamp (scope: jobmanager_job) >> # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge >> flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >> -1.0 >> # HELP flink_jobmanager_job_numberOfFailedCheckpoints >> numberOfFailedCheckpoints (scope: jobmanager_job) >> # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge >> flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >> 0.0 >> # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: >> jobmanager_Status_JVM_Memory_Heap) >> # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge >> flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9 >> # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count >> (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep) >> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge >> flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} >> 2.0 >> # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: >> jobmanager_Status_JVM_CPU) >> # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge >> flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9 >> # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity >> (scope: jobmanager_Status_JVM_Memory_Direct) >> # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge >> flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} >> 604064.0 >> # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job) >> # TYPE flink_jobmanager_job_fullRestarts gauge >> flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >> 0.0 >> >> >> >> >> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> You've said elsewhere that you do see some metrics in prometheus, which >>> are those? >>> >>> Why are you configuring the host for the prometheus reporter? This >>> option is only for the PrometheusPushGatewayReporter. >>> >>> On 06/07/2020 18:01, Manish G wrote: >>> >>> Hi, >>> >>> So I have following in flink-conf.yml : >>> ////////////////////////////////////////////////////// >>> metrics.reporter.prom.class: >>> org.apache.flink.metrics.prometheus.PrometheusReporter >>> metrics.reporter.prom.host: 127.0.0.1 >>> metrics.reporter.prom.port: 9999 >>> metrics.reporter.slf4j.class: >>> org.apache.flink.metrics.slf4j.Slf4jReporter >>> metrics.reporter.slf4j.interval: 30 SECONDS >>> ////////////////////////////////////////////////////// >>> >>> And while I can see custom metrics in Taskmanager logs, but prometheus >>> dashboard logs doesn't show custom metrics. >>> >>> With regards >>> >>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> You have explicitly configured a reporter list, resulting in the slf4j >>>> reporter being ignored: >>>> >>>> 2020-07-06 13:48:22,191 INFO >>>> org.apache.flink.configuration.GlobalConfiguration - Loading >>>> configuration property: metrics.reporters, prom >>>> 2020-07-06 13:48:23,203 INFO >>>> org.apache.flink.runtime.metrics.ReporterSetup - Excluding >>>> reporter slf4j, not configured in reporter list (prom). >>>> >>>> Note that nowadays metrics.reporters is no longer required; the set of >>>> reporters is automatically determined based on configured properties; the >>>> only use-case is disabling a reporter without having to remove the entire >>>> configuration. >>>> I'd suggest to just remove the option, try again, and report back. >>>> >>>> On 06/07/2020 16:35, Chesnay Schepler wrote: >>>> >>>> Please enable debug logging and search for warnings from the metric >>>> groups/registry/reporter. >>>> >>>> If you cannot find anything suspicious, you can also send the foll log >>>> to me directly. >>>> >>>> On 06/07/2020 16:29, Manish G wrote: >>>> >>>> Job is an infinite streaming one, so it keeps going. Flink >>>> configuration is as: >>>> >>>> metrics.reporter.slf4j.class: >>>> org.apache.flink.metrics.slf4j.Slf4jReporter >>>> metrics.reporter.slf4j.interval: 30 SECONDS >>>> >>>> >>>> >>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> >>>>> How long did the job run for, and what is the configured interval? >>>>> >>>>> >>>>> On 06/07/2020 15:51, Manish G wrote: >>>>> >>>>> Hi, >>>>> >>>>> Thanks for this. >>>>> >>>>> I did the configuration as mentioned at the link(changes in >>>>> flink-conf.yml, copying the jar in lib directory), and registered the >>>>> Meter >>>>> with metrics group and invoked markEvent() method in the target code. But >>>>> I >>>>> don't see any related logs. >>>>> I am doing this all on my local computer. >>>>> >>>>> Anything else I need to do? >>>>> >>>>> With regards >>>>> Manish >>>>> >>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>>> Have you looked at the SLF4J reporter? >>>>>> >>>>>> >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter >>>>>> >>>>>> On 06/07/2020 13:49, Manish G wrote: >>>>>> > Hi, >>>>>> > >>>>>> > Is it possible to log Flink metrics in application logs apart from >>>>>> > publishing it to Prometheus? >>>>>> > >>>>>> > With regards >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >