I must bootstrap state from postgres (approximately 200 GB of data) and I
notice that the state processor API requires the DataSet API in order to
bootstrap state for the Stream API.

I wish there was a way to use the SQL API and use a partitioned scan, but I
don't know if that is even possible with the DataSet API.

I never used the DataSet API, and I am unsure how it manages memory, or
distributes load, when handling large state.

Would it run out of memory if I map data from a JDBCInputFormat into a
large DataSet and then use that to bootstrap state for my stream job?

Any advice on how I should proceed with this would be greatly appreciated.

Thank you.

Reply via email to