So I spent some more time on this today, and noticed something interesting
when trying to reproduce it: it only seems to happen if the ParDo using the
side input is fused with another stage that uses state. I'm not quite sure
why this is, but I'm also fairly certain the fix is to simply clear the
On Thu, Jul 22, 2021 at 4:47 AM Steve Niemitz wrote:
> I don't think I'd call it a bug? The cache doesn't differentiate between
> a state cell that existed but was cleared, and one that is missing from the
> cache (maybe it should?).
>
Filing this in my collection of problems caused by "nullabl
I don't think I'd call it a bug? The cache doesn't differentiate between a
state cell that existed but was cleared, and one that is missing from the
cache (maybe it should?). The side input fetcher clears the blocked state
when it becomes unblocked:
https://github.com/apache/beam/blob/master/run
I had opened a jira years ago [1] about this, but would like to actually
fix it for real now, given that our users have started using streaming more
and more.
There's more detail in the jira, but basically side inputs in streaming
pipelines on dataflow lead to pretty bad performance because they r