Another few questions: Have you had the chance to monitor/profile the
memory usage? What section of the memory was used excessively?
Additionally, could @dhanesh arole 's proposal
solve your issue?
Matthias
On Fri, Apr 23, 2021 at 8:41 AM Matthias Pohl
wrote:
> Thanks for sharing these
Thanks for sharing these details. Looking into FLINK-14952 [1] (which
introduced this option) and the related mailing list thread [2], it feels
like your issue is quite similar to what is described in there even though
it sounds like this issue is mostly tied to bounded jobs. But I'm not sure
what
Hi Matthias,
We have “solved” the problem by tuning the join. But I still try to answer the
questions, hoping this will help.
* What is the option you're referring to for the bounded shuffle? That might
help to understand what streaming mode solution you're looking for.
|
Hi,
Questions that @matth...@ververica.com asked are
very valid and might provide more leads. But if you haven't already then
it's worth trying to use jemalloc / tcmalloc. We had similar problems with
slow growth in TM memory resulting in pods getting OOMed by k8s. After
switching to jemalloc,
Hi,
I have a few questions about your case:
* What is the option you're referring to for the bounded shuffle? That
might help to understand what streaming mode solution you're looking for.
* What does the job graph look like? Are you assuming that it's due to a
shuffling operation? Could you
Hi, community,
When running a Flink streaming job with big state size, one task manager
process was killed by the yarn node manager. The following log is from the yarn
node manager:
2021-04-16 11:51:23,013 WARN