OK, Thanks Aljoscha for the info! Guys, great work on Flink, I really love it :)
Cheers, Krzysztof czw., 7.04.2016 o 10:48 użytkownik Aljoscha Krettek <aljos...@apache.org> napisał: > Hi, > you are right. Currently there is no incremental checkpointing and > therefore, at each checkpoint, we essentially copy the whole RocksDB > database to HDFS (or whatever filesystem you chose as a backup location). > As far as I know, Stephan will start working on adding support for > incremental snapshots this week or next week. > > Cheers, > Aljoscha > > On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki <k.zarzy...@gmail.com> > wrote: > >> Hi, >> I saw the documentation and source code of the state management with >> RocksDB and before I use it, I'm concerned of one thing: Am I right that >> currently when state is being checkpointed, the whole RocksDB state is >> snapshotted? There is no incremental, diff snapshotting, is it? If so, this >> seems to be unfeasible for keeping state counted in tens or hundreds of GBs >> (and you reach that size of a state, when you want to keep an embedded >> state of the streaming application instead of going out to Cassandra/Hbase >> or other DB). It will just cost too much to do snapshots of such large >> state. >> >> Samza as a good example to compare, writes every state change to Kafka >> topic, considering it a snapshot in the shape of changelog. Of course in >> the moment of app restart, recovering the state from the changelog would be >> too costly, that is why the changelog topic is compacted. Plus, I think >> Samza does a state snapshot from time to time anyway (but I'm not sure of >> that). >> >> Thanks for answering my doubts, >> Krzysztof >> >>