Hi All,

I am using apache Beam with Flink (1.8.2). In my job, I am using Beam
sideinput (which translates into Flink NonKeyedBroadcastStream) to do
filter of the data from main stream.

I have experienced OOM: GC overhead limit exceeded continuously.

After did some experiments, I observed following behaviour:
1. run job without side input(broadcast stream): no OOM issue
2. run job with side input (kafka topic with 1 partition) with data
available from this side input: no OOM issue
3. run job with side input (kafka topic with 1 partition) without any data
from the side input: *OOM issue*
4. From the heap dump, the message (of type ObjectNode) cannot be GC'd
looks like due to the references hold by Broadcast stream
[image: image.png]

My question is: what is the behaviour from Broadcast stream if there is no
data available? Does it cache the data from main stream and wait until data
becoming available from Broadcast stream to process?

Thanks a lot!
Eleanore

回复