Hi,
Is it possible to use managed operator state like MapState in an
implementation of new unified source interface [1]. I'm especially
interested with using Managed State in SplitEnumerator implementation.

I have a use case that is a variation of File Source where I will have a
great number of files that I need to process, for example a million. I know
that FileSource maintains a collection of already processed paths
in ContinuousFileSplitEnumerator object.

In my case I cannot afford to have all million Strings sitting on my heap.
I'm hoping to use an operator state for this and build splits in batches,
periodically adding new files to the alreadyProcessedPaths collection.

Regards,
Krzysztof Chmielewski


[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/

Reply via email to