Hi Penguin,

Building on top of Yangze's response, you can also take a look at the more
detailed system resources usage [1] after adding an optional dependency to
the class path/lib directory.

Regarding the single task/task slot metrics, as Yangze noted there is
"almost" no isolation of the resources between Tasks (task slots). Almost,
because there is one thing to note. Most of the Flink's Tasks are single
threaded and you can actually monitor how busy is this single thread using
`idleTimeMsPerSecond` metric [2] (which was added in Flink 1.11). In Flink
1.13 this metric will be changed a little bit, as it will be split into two
`idleTimeMsPerSecond` and `backPressuredTimeMsPerSecond`. Additionally
those two will be complemented with the `busyTimeMsPerSecond` metric
[3][4][5]. And those metrics will be easily accessible in the WebUI [6].

I wrote "Most of the Flink's Tasks are single threaded" as there are a
couple of caveats:
- network communication is done in a separate pool of threads
- old style sources (using `SourceFunction` primitive, so basically all
sources apart of a couple of new ones introduced in Flink 1.12) are
spawning another dedicated thread which is not monitored/covered by those
busy/idle time metrics.
- if an operator or user code is spawning it's own threads somehow, those
are also completely ignored (this includes the built in AsyncWaitOperator
[7])

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#system-resources
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html#io
[3] https://issues.apache.org/jira/browse/FLINK-14712
[4] https://issues.apache.org/jira/browse/FLINK-20717
[5] https://issues.apache.org/jira/browse/FLINK-20718
[6]
https://issues.apache.org/jira/browse/FLINK-14814?focusedCommentId=17256926
[7]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html

pon., 18 sty 2021 o 03:33 Yangze Guo <karma...@gmail.com> napisał(a):

> Hi,
>
> First of all, there’s no resource isolation atm between
> operators/tasks within a slot, except for managed memory. So,
> monitoring of individual tasks might be meaningless.
>
> Regarding TM/JM level cpu/memory metrics, you can refer to [1] and
> [2]. Regarding the traffic between tasks, you can refer to [3].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#cpu
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#memory
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#default-shuffle-service
>
> Best,
> Yangze Guo
>
> On Sun, Jan 17, 2021 at 6:43 PM penguin. <bxwhfh...@126.com> wrote:
> >
> > Hello,
> >
> >
> > In the Flink cluster,
> >
> > How to monitor each taskslot of taskmanager? For example, the CPU and
> memory usage of each slot and the traffic between slots.
> >
> > What is the way to get the traffic between nodes?
> >
> > thank you very much!
> >
> >
> > penguin
> >
> >
> >
> >
>

Reply via email to