Re: Logging Flink metrics

Chesnay Schepler Mon, 06 Jul 2020 10:29:21 -0700

WSL is a bit buggy when it comes to allocating ports; it happily lets 2processes create sockets on the same port, except that the latter onedoesn't do anything.

Super annying, and I haven't found a solution to that myself yet.

You'll have to configure the ports explicitly for the JM/TM, which willlikely entail manually starting the processes and updating theconfiguration in-between, e.g.:


./bin/jobmanager.sh start
<update port in config>
./bin/taskmanager.sh start

On 06/07/2020 19:16, Manish G wrote:

Yes.

On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org>> wrote:


    Are you running Flink is WSL by chance?

    On 06/07/2020 19:06, Manish G wrote:

    In flink-conf.yaml:
    *metrics.reporter.prom.port: 9250-9260*

    This is based on information provided here
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>
    /*|port|- (optional) the port the Prometheus exporter listens on,
    defaults to9249
    <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.
    In order to be able to run several instances of the reporter on
    one host (e.g. when one TaskManager is colocated with the
    JobManager) it is advisable to use a port range like|9250-9260|.*/
    /*
    */
    As I am running flink locally, so both jobmanager and taskmanager
    are colocated.

    In prometheus.yml:
    *- job_name: 'flinkprometheus'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9250', 'localhost:9251']
        metrics_path: /*
    *
    *
    This is the whole configuration I have done based on several
    tutorials and blogs available online.
    **


    /**/


    On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler
    <ches...@apache.org <mailto:ches...@apache.org>> wrote:

        These are all JobManager metrics; have you configured
        prometheus to also scrape the task manager processes?

        On 06/07/2020 18:35, Manish G wrote:

        The metrics I see on prometheus is like:
        # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
        # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
        
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 -1.0
        # HELP flink_jobmanager_job_numberOfFailedCheckpoints 
numberOfFailedCheckpoints (scope: jobmanager_job)
        # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
        
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0
        # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
jobmanager_Status_JVM_Memory_Heap)
        # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
        flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 
1.029177344E9
        # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count 
Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
        # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count 
gauge
        
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
 2.0
        # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
jobmanager_Status_JVM_CPU)
        # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
        flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
        # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity 
TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
        # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
        
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
604064.0
        # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: 
jobmanager_job)
        # TYPE flink_jobmanager_job_fullRestarts gauge
        
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0



        On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler
        <ches...@apache.org <mailto:ches...@apache.org>> wrote:

            You've said elsewhere that you do see some metrics in
            prometheus, which are those?

            Why are you configuring the host for the prometheus
            reporter? This option is only for the
            PrometheusPushGatewayReporter.

            On 06/07/2020 18:01, Manish G wrote:

            Hi,

            So I have following in flink-conf.yml :
            //////////////////////////////////////////////////////
            metrics.reporter.prom.class:
            org.apache.flink.metrics.prometheus.PrometheusReporter
            metrics.reporter.prom.host: 127.0.0.1
            metrics.reporter.prom.port: 9999
            metrics.reporter.slf4j.class:
            org.apache.flink.metrics.slf4j.Slf4jReporter
            metrics.reporter.slf4j.interval: 30 SECONDS
            //////////////////////////////////////////////////////

            And while I can see custom metrics in Taskmanager logs,
            but prometheus dashboard logs doesn't show custom metrics.

            With regards

            On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
            <ches...@apache.org <mailto:ches...@apache.org>> wrote:

                You have explicitly configured a reporter list,
                resulting in the slf4j reporter being ignored:

                2020-07-06 13:48:22,191 INFO
                org.apache.flink.configuration.GlobalConfiguration
                - Loading configuration property:
                metrics.reporters, prom
                2020-07-06 13:48:23,203 INFO
                org.apache.flink.runtime.metrics.ReporterSetup -
                Excluding reporter slf4j, not configured in
                reporter list (prom).

                Note that nowadays metrics.reporters is no longer
                required; the set of reporters is automatically
                determined based on configured properties; the only
                use-case is disabling a reporter without having to
                remove the entire configuration.
                I'd suggest to just remove the option, try again,
                and report back.

                On 06/07/2020 16:35, Chesnay Schepler wrote:

                Please enable debug logging and search for
                warnings from the metric groups/registry/reporter.

                If you cannot find anything suspicious, you can
                also send the foll log to me directly.

                On 06/07/2020 16:29, Manish G wrote:

                Job is an infinite streaming one, so it keeps
                going. Flink configuration is as:

                metrics.reporter.slf4j.class:
                org.apache.flink.metrics.slf4j.Slf4jReporter
                metrics.reporter.slf4j.interval: 30 SECONDS



                On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
                <ches...@apache.org <mailto:ches...@apache.org>>
                wrote:

                    How long did the job run for, and what is the
                    configured interval?


                    On 06/07/2020 15:51, Manish G wrote:

                    Hi,

                    Thanks for this.

                    I did the configuration as mentioned at the
                    link(changes in flink-conf.yml, copying the
                    jar in lib directory), and registered the
                    Meter with metrics group and invoked
                    markEvent() method in the target code. But I
                    don't see any related logs.
                    I am doing this all on my local computer.

                    Anything else I need to do?

                    With regards
                    Manish

                    On Mon, Jul 6, 2020 at 5:24 PM Chesnay
                    Schepler <ches...@apache.org
                    <mailto:ches...@apache.org>> wrote:

                        Have you looked at the SLF4J reporter?

                        
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

                        On 06/07/2020 13:49, Manish G wrote:
                        > Hi,
                        >
                        > Is it possible to log Flink metrics in
                        application logs apart from
                        > publishing it to Prometheus?
                        >
                        > With regards

Re: Logging Flink metrics

Reply via email to