Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-12-04 Thread Aaron Levin
Thanks for the clarification. I'll try to find some time to write a reproducible test case and submit a ticket. While it may not be able to delete the non-referenced ones, I'm surprised it's exponentially replicating them, and so it's probably worth documenting in a ticket. On Wed, Nov 27, 2019 at

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Gyula Fóra
You are right Aaron. I would say this is like this by design as Flink doesn't require you to initialize state in the open method so it has no safe way to delete the non-referenced ones. What you can do is restore the state and clear it on all operators and not reference it again. I know this feel

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Aaron Levin
Hi, Yes, we're using UNION state. I would assume, though, that if you are not reading the UNION state it would either stop stick around as a constant factor in your state size, or get cleared. Looks like I should try to recreate a small example and submit a bug if this is true. Otherwise it's imp

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Congxian Qiu
Hi Do you use UNION state in your scenario, when using UNION state, then JM may encounter OOM because each TDD will contains all the state of all subtasks[1] [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state Best, Congxian Aaron

What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-26 Thread Aaron Levin
Hi, Some context: after a refactoring, we were unable to start our jobs. They started fine and checkpointed fine, but once the job restarted owing to a transient failure, the application was unable to start. The Job Manager was OOM'ing (even when I gave them 256GB of ram!). The `_metadata` file fo