Re: Logging Flink metrics

Manish G Mon, 06 Jul 2020 10:18:07 -0700

Yes.

On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <ches...@apache.org> wrote:


> Are you running Flink is WSL by chance?
>
> On 06/07/2020 19:06, Manish G wrote:
>
> In flink-conf.yaml:
> *metrics.reporter.prom.port: 9250-9260*
>
> This is based on information provided here
> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>
> *port - (optional) the port the Prometheus exporter listens on, defaults
> to 9249
> <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.
> In order to be able to run several instances of the reporter on one host
> (e.g. when one TaskManager is colocated with the JobManager) it is
> advisable to use a port range like 9250-9260.*
>
> As I am running flink locally, so both jobmanager and taskmanager are
> colocated.
>
> In prometheus.yml:
>
>
>
>
> *- job_name: 'flinkprometheus'     scrape_interval: 5s     static_configs:
>       - targets: ['localhost:9250', 'localhost:9251']     metrics_path: /*
>
> This is the whole configuration I have done based on several tutorials and
> blogs available online.
>
>
>
>
> On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> These are all JobManager metrics; have you configured prometheus to also
>> scrape the task manager processes?
>>
>> On 06/07/2020 18:35, Manish G wrote:
>>
>> The metrics I see on prometheus is like:
>>
>> # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
>> lastCheckpointRestoreTimestamp (scope: jobmanager_job)
>> # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
>> flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>  -1.0
>> # HELP flink_jobmanager_job_numberOfFailedCheckpoints 
>> numberOfFailedCheckpoints (scope: jobmanager_job)
>> # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
>> flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>  0.0
>> # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
>> jobmanager_Status_JVM_Memory_Heap)
>> # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
>> flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
>> # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count 
>> (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
>> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
>> flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
>>  2.0
>> # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
>> jobmanager_Status_JVM_CPU)
>> # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
>> flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
>> # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity 
>> (scope: jobmanager_Status_JVM_Memory_Direct)
>> # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
>> flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
>> 604064.0
>> # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
>> # TYPE flink_jobmanager_job_fullRestarts gauge
>> flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>  0.0
>>
>>
>>
>>
>> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> You've said elsewhere that you do see some metrics in prometheus, which
>>> are those?
>>>
>>> Why are you configuring the host for the prometheus reporter? This
>>> option is only for the PrometheusPushGatewayReporter.
>>>
>>> On 06/07/2020 18:01, Manish G wrote:
>>>
>>> Hi,
>>>
>>> So I have following in flink-conf.yml :
>>> //////////////////////////////////////////////////////
>>> metrics.reporter.prom.class:
>>> org.apache.flink.metrics.prometheus.PrometheusReporter
>>> metrics.reporter.prom.host: 127.0.0.1
>>> metrics.reporter.prom.port: 9999
>>> metrics.reporter.slf4j.class:
>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>> //////////////////////////////////////////////////////
>>>
>>> And while I can see custom metrics in Taskmanager logs, but prometheus
>>> dashboard logs doesn't show custom metrics.
>>>
>>> With regards
>>>
>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>> reporter being ignored:
>>>>
>>>> 2020-07-06 13:48:22,191 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: metrics.reporters, prom
>>>> 2020-07-06 13:48:23,203 INFO
>>>> org.apache.flink.runtime.metrics.ReporterSetup                - Excluding
>>>> reporter slf4j, not configured in reporter list (prom).
>>>>
>>>> Note that nowadays metrics.reporters is no longer required; the set of
>>>> reporters is automatically determined based on configured properties; the
>>>> only use-case is disabling a reporter without having to remove the entire
>>>> configuration.
>>>> I'd suggest to just remove the option, try again, and report back.
>>>>
>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>
>>>> Please enable debug logging and search for warnings from the metric
>>>> groups/registry/reporter.
>>>>
>>>> If you cannot find anything suspicious, you can also send the foll log
>>>> to me directly.
>>>>
>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>
>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>> configuration is as:
>>>>
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>
>>>>
>>>>
>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org>
>>>> wrote:
>>>>
>>>>> How long did the job run for, and what is the configured interval?
>>>>>
>>>>>
>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for this.
>>>>>
>>>>> I did the configuration as mentioned at the link(changes in
>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>> Meter
>>>>> with metrics group and invoked markEvent() method in the target code. But 
>>>>> I
>>>>> don't see any related logs.
>>>>> I am doing this all on my local computer.
>>>>>
>>>>> Anything else I need to do?
>>>>>
>>>>> With regards
>>>>> Manish
>>>>>
>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Have you looked at the SLF4J reporter?
>>>>>>
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>
>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>> > publishing it to Prometheus?
>>>>>> >
>>>>>> > With regards
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Logging Flink metrics

Reply via email to