Thanks for the clarification. I'll try to find some time to write a
reproducible test case and submit a ticket. While it may not be able
to delete the non-referenced ones, I'm surprised it's exponentially
replicating them, and so it's probably worth documenting in a ticket.
On Wed, Nov 27, 2019 at
You are right Aaron.
I would say this is like this by design as Flink doesn't require you to
initialize state in the open method so it has no safe way to delete the
non-referenced ones.
What you can do is restore the state and clear it on all operators and not
reference it again. I know this feel
Hi,
Yes, we're using UNION state. I would assume, though, that if you are
not reading the UNION state it would either stop stick around as a
constant factor in your state size, or get cleared.
Looks like I should try to recreate a small example and submit a bug
if this is true. Otherwise it's imp
Hi
Do you use UNION state in your scenario, when using UNION state, then JM
may encounter OOM because each TDD will contains all the state of all
subtasks[1]
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state
Best,
Congxian
Aaron
Hi,
Some context: after a refactoring, we were unable to start our jobs.
They started fine and checkpointed fine, but once the job restarted
owing to a transient failure, the application was unable to start. The
Job Manager was OOM'ing (even when I gave them 256GB of ram!). The
`_metadata` file fo