[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779149#action_12779149
 ] 

Patrick Hunt commented on ZOOKEEPER-582:
----------------------------------------

As Ben mentioned we will never see this situation during normal operation of ZK.

The case where we did see this was a result of a user running the migration 
tool that we provide to upgrade from version 2 to version 3 of ZooKeeper. The 
tool migrates the data by writing a single snapshot file where the zxid is 
maintained (it does not write a log file). As a result of the scenario Ben 
mentioned (snap with no associated log file) this could cause this bug to 
occur. If you have run the migration tool, documented here:
http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html#migration_data
you can verify whether or not you have this situation by looking at your 
ZooKeeper datadirectory

Here's an example

-rw-r--r--  1 root search 67108880 Nov 17 19:31 log.300022b61
-rw-r--r--  1 root search 67108880 Nov 17 19:38 log.3000292d0
-rw-r--r--  1 root search  3646608 Nov  5 12:13 snapshot.1db5df6e2d6
-rw-r--r--  1 root search  3616579 Nov 17 19:31 snapshot.3000292c9
-rw-r--r--  1 root search  3616708 Nov 17 19:38 snapshot.300038d32

where the files are of the form <file>.<epoch><xid> 
epoch and xid both being 4 byte values represented as hex

Notice that the snapshot.1db5df6e2d6 has epoch of 0x1db, while the other
files have epoch of 0x3, this is the scenario described in the description of 
this
JIRA. (there is no log file associated with epoch 0x1db)

If you see this in your datadir - a snapshot with an epoch where there are no 
log files with
this same epoch, then this bug pertains.  If you see snapshots of a particular 
epoch
and log files with the same epoch then this bug does NOT pertain.


> ZooKeeper can revert to old data when a snapshot is created outside of normal 
> processing
> ----------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-582
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-582
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.1.1, 3.2.1
>            Reporter: Benjamin Reed
>            Priority: Blocker
>             Fix For: 3.2.2, 3.1.2
>
>
> when zookeeper starts up it will restore the most recent state (latest zxid) 
> it finds in the data directory. unfortunately, in the quorum version of 
> zookeeper updates are logged using an epoch based on the latest log file in a 
> directory. if there is a snapshot with a higher epoch than the log files, the 
> zookeeper server will start logging using an epoch one higher than the 
> highest log file.
> so if a data directory has a snapshot with an epoch of 27 and there are no 
> log files, zookeeper will start logging changes using epoch 1. if the cluster 
> restarts the state will be restored from the snapshot with the epoch of 27, 
> which in effect, restores old data.
> normal operation of zookeeper will never result in this situation.
> this does not effect standalone zookeeper.
> a fix should make sure to use an epoch one higher than the current state, 
> whether it comes from the snapshot or log, and should include a sanity check 
> to make sure that a follower never connects to a leader that has a lower 
> epoch than its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to