Re: FsStateBackend vs RocksDBStateBackend

2020-02-28 Thread Robert Metzger
Sorry for the late reply. There's not much you can do at the moment, as Flink needs to sync on the checkpoint barriers. There's something in the making for addressing the issue soon: https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints Did you try out using the FsSta

Re: FsStateBackend vs RocksDBStateBackend

2020-02-23 Thread Yu Li
Yes FsStateBackend would be the best fit for state access performance in this case. Just a reminder that FsStateBackend will upload the full dataset to DFS during checkpointing, so please watch the network bandwidth usage and make sure it won't become a new bottleneck. Best Regards, Yu On Fri, 2

Re: FsStateBackend vs RocksDBStateBackend

2020-02-21 Thread Robert Metzger
I would try the FsStateBackend in this scenario, as you have enough memory available. On Thu, Jan 30, 2020 at 5:26 PM Ran Zhang wrote: > Hi Gordon, > > Thanks for your reply! Regarding state size - we are at 200-300gb but we > have 120 parallelism which will make each task handle ~2 - 3 gb state

Re: FsStateBackend vs RocksDBStateBackend

2020-01-30 Thread Ran Zhang
Hi Gordon, Thanks for your reply! Regarding state size - we are at 200-300gb but we have 120 parallelism which will make each task handle ~2 - 3 gb state. (when we submit the job we are setting tm memory to 15g.) In this scenario what will be the best fit for statebackend? Thanks, Ran On Wed, Ja

Re: FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi Ran, On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang wrote: > Hi all, > > We have a Flink app that uses a KeyedProcessFunction, and in the function > it requires a ValueState(of TreeSet) and the processElement method needs to > access and update it. We tried to use RocksDB as our stateBackend but t

FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Ran Zhang
Hi all, We have a Flink app that uses a KeyedProcessFunction, and in the function it requires a ValueState(of TreeSet) and the processElement method needs to access and update it. We tried to use RocksDB as our stateBackend but the performance is not good, and intuitively we think it was because o