Hello,

What is the best way to measure the CPU utilization of a TaskManager in
Flink, as opposed to using Linux's "top" command? Is querying the REST
endpoint 
http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
the best option? Roman's reply (copied below) from the archives suggests
that it returns the CPU usage for the whole system including
other processes currently in the system, and would not give the CPU
utilization only of that Task Manager.

Based on Roman's reply that JVM.CPU.Time is a more clear indicator of CPU
usage, can you suggest how I would use it to calculate CPU utilization? Is
there any way I can get the CPU utilization for a Job that is distributed
over several nodes in the cluster?

Also, what is the difference between the two REST API endpoints below:

1. http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
2. http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=System.CPU.Usage\

Thanks,

Piper

Hi,

JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on
top of OperatingSystemMXBean.getProcessCpuLoad (see
https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0
<https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad%3Chttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0>>())

Usually it looks weird if you have multiple CPU cores. For example, if
you have a job with a single slot 100% utilizing a single CPU core on
a 8 core machine, the JVM.CPU.Load will be 1.0/8.0 = 0.125. It's also
a point-in-time snapshot of current CPU usage, so if you're collecting
your metrics every minute, and the job has spiky workload within this
minute (like it's idle almost always and once in a minute it consumes
100% CPU for one second), so you have a chance to completely miss this
from the metrics.

As for me personally, JVM.CPU.Time is more clear indicator of CPU
usage, which is always increasing amount of milliseconds CPU spent
executing your code. And it will also catch CPU usage spikes.

Roman Grebennikov | g...@dfdx.me<ma...@dfdx.me>

Reply via email to