[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Vishal K (JIRA) Mon, 01 Nov 2010 13:12:48 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927127#action_12927127
 ]


Vishal K commented on ZOOKEEPER-872:
------------------------------------

Hi Pat,

The current code prints to stdout.  We have a RMI service that has ZK server 
embedded in it. We do this so that we can run/start/stop ZK across platforms 
without having to write platform specific scripts. In this server, we  start a 
thread that periodically calls PurgeTxnlog.purge(). As you pointed out, we 
should have a -q flag to direct to log instead stdout to statisfy both the 
approaches. I will make that change.

We chose number 2 here because we think having only one backup will be enough. 
It is not clear to us under what conditions the additional backup will be 
useful.

Backups are useful under the following scenario (correct me if I am wrong):
1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the 
past snapshots and transaction logs are ok. Corrupting can mean either disk 
file corruption or corrupting of transaction entries in the log. We store 
ZooKeeper data on mirrored disks.
2. The application itself made some errors that requires reverting back to the 
older version.

For the first point, having one additional backup would suffice. The second 
point is really tricky. I am not sure how the application can decide which 
snapshot to revert to. I think in most cases it will be trial and error. It is 
not clear to me how to estimate the number of backups needed. Also, it is not 
clear how one would go about going back in time. I looked at LogFormatter 
utility and that utility does not help much in undoing the erroneous 
transactions for case 2 above. In general, I think it is good to enforce users 
to have a minimum of one backup.

Related question: Is there hash on the log files (or internal tree structures) 
that can tell the ZooKeeper server if the logs are corrupted. If yes, the 
zookeeper server can verify the hash during startup and take some action based 
on that. For example, make sure that it never becomes a leader until it gets 
the correct snapshot from the existing leader (otherwise it may endup 
corrupting other server's log). "Corrupting" here refers to the case where the 
file is readable, but one or more transactions in the log are bad.

I am not sure if there is a test for this. If I remember correctly, there is a 
bug that causes the purge() function to leave behind one addition log file. 
Please refer to my question above about findNRecentSnapshots(). I can add a 
test or modify the pruge utlity once we have concluded this discussion.

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, 
> it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Reply via email to