Hi All, I am using apache Beam with Flink (1.8.2). In my job, I am using Beam sideinput (which translates into Flink NonKeyedBroadcastStream) to do filter of the data from main stream.
I have experienced OOM: GC overhead limit exceeded continuously. After did some experiments, I observed following behaviour: 1. run job without side input(broadcast stream): no OOM issue 2. run job with side input (kafka topic with 1 partition) with data available from this side input: no OOM issue 3. run job with side input (kafka topic with 1 partition) without any data from the side input: *OOM issue* 4. From the heap dump, the message (of type ObjectNode) cannot be GC'd looks like due to the references hold by Broadcast stream [image: image.png] My question is: what is the behaviour from Broadcast stream if there is no data available? Does it cache the data from main stream and wait until data becoming available from Broadcast stream to process? Thanks a lot! Eleanore