Re: RocksDB state checkpointing is expensive?

Krzysztof Zarzycki Thu, 07 Apr 2016 22:14:09 -0700

OK, Thanks Aljoscha for the info!
Guys, great work on Flink, I really love it :)


Cheers,
Krzysztof

czw., 7.04.2016 o 10:48 użytkownik Aljoscha Krettek <aljos...@apache.org>
napisał:

> Hi,
> you are right. Currently there is no incremental checkpointing and
> therefore, at each checkpoint, we essentially copy the whole RocksDB
> database to HDFS (or whatever filesystem you chose as a backup location).
> As far as I know, Stephan will start working on adding support for
> incremental snapshots this week or next week.
>
> Cheers,
> Aljoscha
>
> On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki <k.zarzy...@gmail.com>
> wrote:
>
>> Hi,
>> I saw the documentation and source code of the state management with
>> RocksDB and before I use it, I'm concerned of one thing: Am I right that
>> currently when state is being checkpointed, the whole RocksDB state is
>> snapshotted? There is no incremental, diff snapshotting, is it? If so, this
>> seems to be unfeasible for keeping state counted in tens or hundreds of GBs
>> (and you reach that size of a state, when you want to keep an embedded
>> state of the streaming application instead of going out to Cassandra/Hbase
>> or other DB). It will just cost too much to do snapshots of such large
>> state.
>>
>> Samza as a good example to compare, writes every state change to Kafka
>> topic, considering it a snapshot in the shape of changelog. Of course in
>> the moment of app restart, recovering the state from the changelog would be
>> too costly, that is why the changelog topic is compacted. Plus, I think
>> Samza does a state snapshot from time to time anyway (but I'm not sure of
>> that).
>>
>> Thanks for answering my doubts,
>> Krzysztof
>>
>>

Re: RocksDB state checkpointing is expensive?

Reply via email to