Re: Logging Flink metrics

Manish G Mon, 06 Jul 2020 10:07:06 -0700

In flink-conf.yaml:
*metrics.reporter.prom.port: 9250-9260*

This is based on information provided here
<https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>
*port - (optional) the port the Prometheus exporter listens on, defaults
to 9249
<https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.
In order to be able to run several instances of the reporter on one host
(e.g. when one TaskManager is colocated with the JobManager) it is
advisable to use a port range like 9250-9260.*


As I am running flink locally, so both jobmanager and taskmanager are
colocated.

In prometheus.yml:




*- job_name: 'flinkprometheus'    scrape_interval: 5s    static_configs:
  - targets: ['localhost:9250', 'localhost:9251']    metrics_path: /*

This is the whole configuration I have done based on several tutorials and
blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org> wrote:

> These are all JobManager metrics; have you configured prometheus to also
> scrape the task manager processes?
>
> On 06/07/2020 18:35, Manish G wrote:
>
> The metrics I see on prometheus is like:
>
> # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
> lastCheckpointRestoreTimestamp (scope: jobmanager_job)
> # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
> flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>  -1.0
> # HELP flink_jobmanager_job_numberOfFailedCheckpoints 
> numberOfFailedCheckpoints (scope: jobmanager_job)
> # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
> flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>  0.0
> # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
> jobmanager_Status_JVM_Memory_Heap)
> # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
> flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
> # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count 
> (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
> flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
>  2.0
> # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
> jobmanager_Status_JVM_CPU)
> # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
> flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
> # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity 
> (scope: jobmanager_Status_JVM_Memory_Direct)
> # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
> flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
> 604064.0
> # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
> # TYPE flink_jobmanager_job_fullRestarts gauge
> flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>  0.0
>
>
>
>
> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> You've said elsewhere that you do see some metrics in prometheus, which
>> are those?
>>
>> Why are you configuring the host for the prometheus reporter? This
>> option is only for the PrometheusPushGatewayReporter.
>>
>> On 06/07/2020 18:01, Manish G wrote:
>>
>> Hi,
>>
>> So I have following in flink-conf.yml :
>> //////////////////////////////////////////////////////
>> metrics.reporter.prom.class:
>> org.apache.flink.metrics.prometheus.PrometheusReporter
>> metrics.reporter.prom.host: 127.0.0.1
>> metrics.reporter.prom.port: 9999
>> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
>> metrics.reporter.slf4j.interval: 30 SECONDS
>> //////////////////////////////////////////////////////
>>
>> And while I can see custom metrics in Taskmanager logs, but prometheus
>> dashboard logs doesn't show custom metrics.
>>
>> With regards
>>
>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> You have explicitly configured a reporter list, resulting in the slf4j
>>> reporter being ignored:
>>>
>>> 2020-07-06 13:48:22,191 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: metrics.reporters, prom
>>> 2020-07-06 13:48:23,203 INFO
>>> org.apache.flink.runtime.metrics.ReporterSetup                - Excluding
>>> reporter slf4j, not configured in reporter list (prom).
>>>
>>> Note that nowadays metrics.reporters is no longer required; the set of
>>> reporters is automatically determined based on configured properties; the
>>> only use-case is disabling a reporter without having to remove the entire
>>> configuration.
>>> I'd suggest to just remove the option, try again, and report back.
>>>
>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>
>>> Please enable debug logging and search for warnings from the metric
>>> groups/registry/reporter.
>>>
>>> If you cannot find anything suspicious, you can also send the foll log
>>> to me directly.
>>>
>>> On 06/07/2020 16:29, Manish G wrote:
>>>
>>> Job is an infinite streaming one, so it keeps going. Flink configuration
>>> is as:
>>>
>>> metrics.reporter.slf4j.class:
>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> How long did the job run for, and what is the configured interval?
>>>>
>>>>
>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for this.
>>>>
>>>> I did the configuration as mentioned at the link(changes in
>>>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>>> with metrics group and invoked markEvent() method in the target code. But I
>>>> don't see any related logs.
>>>> I am doing this all on my local computer.
>>>>
>>>> Anything else I need to do?
>>>>
>>>> With regards
>>>> Manish
>>>>
>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org>
>>>> wrote:
>>>>
>>>>> Have you looked at the SLF4J reporter?
>>>>>
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>
>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>> > publishing it to Prometheus?
>>>>> >
>>>>> > With regards
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Logging Flink metrics

Reply via email to