Yes FsStateBackend would be the best fit for state access performance in this case. Just a reminder that FsStateBackend will upload the full dataset to DFS during checkpointing, so please watch the network bandwidth usage and make sure it won't become a new bottleneck.
Best Regards, Yu On Fri, 21 Feb 2020 at 20:56, Robert Metzger <rmetz...@apache.org> wrote: > I would try the FsStateBackend in this scenario, as you have enough memory > available. > > On Thu, Jan 30, 2020 at 5:26 PM Ran Zhang <ranzh...@pinterest.com> wrote: > >> Hi Gordon, >> >> Thanks for your reply! Regarding state size - we are at 200-300gb but we >> have 120 parallelism which will make each task handle ~2 - 3 gb state. >> (when we submit the job we are setting tm memory to 15g.) In this scenario >> what will be the best fit for statebackend? >> >> Thanks, >> Ran >> >> On Wed, Jan 29, 2020 at 6:37 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org> >> wrote: >> >>> Hi Ran, >>> >>> On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang <ranzh...@pinterest.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> We have a Flink app that uses a KeyedProcessFunction, and in the >>>> function it requires a ValueState(of TreeSet) and the processElement method >>>> needs to access and update it. We tried to use RocksDB as our stateBackend >>>> but the performance is not good, and intuitively we think it was because of >>>> the serialization / deserialization on each processElement call. >>>> >>> >>> As you have already pointed out, serialization behaviour is a major >>> difference between the 2 state backends, and will directly impact >>> performance due to the extra runtime overhead in RocksDB. >>> If you plan to continue using the RocksDB state backend, make sure to >>> use MapState instead of ValueState where possible, since every access to >>> the ValueState in the RocksDB backend requires serializing / deserializing >>> the whole value. >>> For MapState, de-/serialization happens per K-V access. Whether or not >>> this makes sense would of course depend on your state access pattern. >>> >>> >>>> Then we tried to switch to use FsStateBackend (which keeps the >>>> in-flight data in the TaskManager’s memory according to doc), and it could >>>> resolve the performance issue. *So we want to understand better what >>>> are the tradeoffs in choosing between these 2 stateBackend.* Our >>>> checkpoint size is 200 - 300 GB in stable state. For now we know one >>>> benefits of RocksDB is it supports incremental checkpoint, but would love >>>> to know what else we are losing in choosing FsStateBackend. >>>> >>> >>> As of now, feature-wise both backends support asynchronous snapshotting, >>> state schema evolution, and access via the State Processor API. >>> In the end, the major factor for deciding between the two state backends >>> would be your expected state size. >>> That being said, it could be possible in the future that savepoint >>> formats for the backends are changed to be compatible, meaning that you >>> will be able to switch between different backends upon restore [1]. >>> >>> >>>> >>>> Thanks a lot! >>>> Ran Zhang >>>> >>> >>> Cheers, >>> Gordon >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State >>> >>