Hi

1. From my experience, Flink can support such big state, you can set
appropriate parallelism for the stateful operator. for RocksDB you may need
to care about the disk performance.
2. Inside Flink, the state is separated by key-group, each
task/parallelism contains multiple key-groups.  Flink does not need to
restart when you add a node to the cluster, but every time restart from
savepoint/checkpoint(or failover), Flink needs to redistribute the
checkpoint data, this can be omitted if it's failover and local recovery[1]
is enabled
3. for upload/download state, you can ref to the multiple thread
upload/download[2][3] for speed up them

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#task-local-recovery
[2] https://issues.apache.org/jira/browse/FLINK-10461
[3] https://issues.apache.org/jira/browse/FLINK-11008

Best,
Congxian


Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月1日周五 下午6:29写道:

> Hello all,
> We have read in multiple
> <https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html>
> sources <https://flink.apache.org/usecases.html> that Flink has been used
> for use cases with terabytes of application state.
>
> We are considering using Flink for a similar use case with* keyed state
> in the range of 20 to 30 TB*. We had a few questions regarding the same.
>
>
>    - *Is Flink a good option for this kind of scale of data* ? We are
>    considering using RocksDB as the state backend.
>    - *What happens when we want to add a node to the cluster *?
>       - As per our understanding, if we have 10 nodes in our cluster,
>       with 20TB of state, this means that adding a node would require the 
> entire
>       20TB of data to be shipped again from the external checkpoint remote
>       storage to the taskmanager nodes.
>       - Assuming 1Gb/s network speed, and assuming all nodes can read
>       their respective 2TB state parallely, this would mean a *minimum
>       downtime of half an hour*. And this is assuming the throughput of
>       the remote storage does not become the bottleneck.
>       - Is there any way to reduce this estimated downtime ?
>
>
> Thank you!
>

Reply via email to