Hi Robert,
I see, so the join needs to consume all data first and process it.
In my case, I couldn't wait long because the first join quickly generated a
lot of data that can't fit in the memory or in the disk. The solution was
then to manually specify a JoinHint and broadcast the small dataset, t
Hi Yassine,
you don't necessarily need to set the parallelism of the last two operators
of 31, the sink with parallelism 1 will fit still into the slots.
A task slot can, by default, hold an entire "slice" or parallel instance of
a job.
The reason why the sink stays in state CREATE in the beginni
Hi all,
My batch job has the follwoing plan in the end (figure attached):
I have a total of 32 task slots, and I have set the parallelism of the last
two operators before the sink to 31. The sink parallelism is 1. The last
operator before the sink is a MapOperator, so it doesn't need to buffer