Hi Abhinav, according to [1], you need 8u261 for the OperatingSystemMXBean to work as expected.
[1] https://bugs.openjdk.java.net/browse/JDK-8242287 On Thu, Aug 13, 2020 at 1:10 AM Bajaj, Abhinav <abhinav.ba...@here.com> wrote: > Thanks Xintong for your input. > > > > From the information I could find, I understand the JDK version 1.8.0_212 > we use includes the docker/container support. > > I also had a quick test inside the docker image using the below β > > Runtime.getRuntime().availableProcessors() > > > > It showed the right number of CPU cores associated to container. > > > > But I am not familiar with OperatingSystemMXBean used by Flink. > > So I donβt know if it will pick up docker CPU limits set by K8s or not. I > will continue to investigate that. > > > > In meantime, the K8s metric - container_cpu_usage_seconds_total does seem > to provide the expected CPU usage for the containers. > > > > > > I was hoping that someone in the community may have already ran into this > behavior on K8s and can share their specific experience π. > > > > Thanks much. > > ~ Abhinav Bajaj > > > > *From: *Xintong Song <tonysong...@gmail.com> > *Date: *Wednesday, August 12, 2020 at 3:56 AM > *To: *"Bajaj, Abhinav" <abhinav.ba...@here.com> > *Cc: *"user@flink.apache.org" <user@flink.apache.org>, Roman Grebennikov < > g...@dfdx.me> > *Subject: *Re: Flink CPU load metrics in K8s > > > > Hi Abhinav, > > > > Do you know how many total cpus does the physical machine have where the > kubernetes container is running? > > > > I'm asking because I suspect whether JVM is aware that only 1 cpu is > configured for the container. It does not work like JVM understands how > many cpu are configured and controls itself to not use more than that. On > the other hand, JVM tries to use as much cpu time as possible, and the > limit comes from external (OS, docker, cgroup, ...). > > > > Please understand that docker containers are not virtual machines. They do > not "pretend" to only have certain hardwares. I did a simple test on my > laptop, launching a docker container with cpu limit configured. Inside the > container, I can still see all my machine's cpus. > > > Thank you~ > > Xintong Song > > > > > > On Wed, Aug 12, 2020 at 1:19 AM Bajaj, Abhinav <abhinav.ba...@here.com> > wrote: > > Hi, > > > > Reaching out to folks running Flink on K8s. > > > > ~ Abhinav Bajaj > > > > *From: *"Bajaj, Abhinav" <abhinav.ba...@here.com> > *Date: *Wednesday, August 5, 2020 at 1:46 PM > *To: *Roman Grebennikov <g...@dfdx.me>, "user@flink.apache.org" < > user@flink.apache.org> > *Subject: *Re: Flink CPU load metrics in K8s > > > > Thanks Roman for providing the details. > > > > I also made more observations that has increased my confusion about this > topic π > > To ease the calculations, I deployed a test cluster this time providing 1 > CPU in K8s(with docker) for all the taskmanager container. > > > > When I check the taskmanager CPU load, the value is in the order of > "0.002158428663932657". > > Assuming that the underlying JVM recognizes 1 CPU allocated to the docker > container, this values means % CPU usage in ball park of 0.21%. > > > > However, if I look at the K8s metrics(formula below) for this container β > it turns out in the ball park of 10-16%. > > There is no other process running in the container apart from the flink > taskmanager. > > > > The order of these two values of CPU % usage is different. > > > > *Am I comparing the right metrics here?* > > *How are folks running Flink on K8s monitoring the CPU load?* > > > > ~ Abhi > > > > *% CPU usage from K8s metrics* > > sum(rate(container_cpu_usage_seconds_total{pod=~"my-taskmanagers-*", > container="taskmanager"}[5m])) by (pod) > > / sum(container_spec_cpu_quota{pod=~"my-taskmanager-pod-*", > container="taskmanager"} > > / container_spec_cpu_period{pod=~"my-taskmanager-pod-*", > container="taskmanager"}) by (pod) > > > > *From: *Roman Grebennikov <g...@dfdx.me> > *Date: *Tuesday, August 4, 2020 at 12:42 AM > *To: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Re: Flink CPU load metrics in K8s > > > > *LEARN FAST: This email originated outside of HERE.* > Please do not click on links or open attachments unless you recognize the > sender and know the content is safe. Thank you. > > > > Hi, > > > > JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on top > of OperatingSystemMXBean.getProcessCpuLoad (see > https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0> > ()) > > > > Usually it looks weird if you have multiple CPU cores. For example, if you > have a job with a single slot 100% utilizing a single CPU core on a 8 core > machine, the JVM.CPU.Load will be 1.0/8.0 = 0.125. It's also a > point-in-time snapshot of current CPU usage, so if you're collecting your > metrics every minute, and the job has spiky workload within this minute > (like it's idle almost always and once in a minute it consumes 100% CPU for > one second), so you have a chance to completely miss this from the metrics. > > > > As for me personally, JVM.CPU.Time is more clear indicator of CPU usage, > which is always increasing amount of milliseconds CPU spent executing your > code. And it will also catch CPU usage spikes. > > > > Roman Grebennikov | g...@dfdx.me > > > > > > On Mon, Aug 3, 2020, at 23:34, Bajaj, Abhinav wrote: > > Hi, > > > > I am trying to understand the CPU Load metrics reported by Flink 1.7.1 > running with openjdk 1.8.0_212 on K8s. > > > > After deploying the Flink Job on K8s, I tried to get CPU Load metrics > following this documentation > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.7%2Fmonitoring%2Fmetrics.html%23rest-api-integration&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=I5%2FK%2FHSbtnQ%2F3%2FLYOK1wOIda2fnxRdqrDfyMv5N0KBY%3D&reserved=0> > . > > curl > localhost:8081/taskmanagers/7737ac33b311ea0a696422680711597b/metrics?get=Status.JVM.CPU.Load,Status.JVM.CPU.Time > > [{"id":"Status.JVM.CPU.Load","value":"0.0023815194093831865 > "},{"id":"Status.JVM.CPU.Time","value":"23260000000"}] > > > > The value of the CPU load looks odd to me. > > > > What is the unit and scale of this value? > > How does Flink determine this value? > > > > Appreciate your time and help here. > > ~ Abhinav Bajaj > > > > > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng