GitHub user dguy opened a pull request:

    https://github.com/apache/kafka/pull/2471

    KAFKA-4317: Checkpoint State Stores on commit/flush

    Currently the checkpoint file is deleted at state store initialization and 
it is only ever written again during a clean shutdown. This can result in 
significant delays during restarts as the entire store needs to be loaded from 
the changelog. 
    We can mitigate against this by frequently checkpointing the offsets. The 
checkpointing happens only during the commit phase, i.e, after we have manually 
flushed the store and the producer. So we guarantee that the checkpointed 
offsets are never greater than what has been flushed. 
    In the event of hard failure we can recover by reading the checkpoints and 
consuming from the stored offsets.
    The checkpoint interval can be controlled by the config 
`statestore.checkpoint.interval.ms` - if this is set to a value <= 0 it 
effectively turns checkpoints off. The interval is only i guide in that the 
minimum checkpoint time is always going to be the commit interval (as we need 
to do this to guarantee consistency)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dguy/kafka kafka-4317

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2471
    
----
commit 6743dc63293e2d0fca57dcb7d1a0ace5237837b0
Author: Damian Guy <damian....@gmail.com>
Date:   2017-01-31T13:37:00Z

    checkpoint statestores

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to