[
https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927127#action_12927127
]
Vishal K commented on ZOOKEEPER-872:
------------------------------------
Hi Pat,
The current code prints to stdout. We have a RMI service that has ZK server
embedded in it. We do this so that we can run/start/stop ZK across platforms
without having to write platform specific scripts. In this server, we start a
thread that periodically calls PurgeTxnlog.purge(). As you pointed out, we
should have a -q flag to direct to log instead stdout to statisfy both the
approaches. I will make that change.
We chose number 2 here because we think having only one backup will be enough.
It is not clear to us under what conditions the additional backup will be
useful.
Backups are useful under the following scenario (correct me if I am wrong):
1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the
past snapshots and transaction logs are ok. Corrupting can mean either disk
file corruption or corrupting of transaction entries in the log. We store
ZooKeeper data on mirrored disks.
2. The application itself made some errors that requires reverting back to the
older version.
For the first point, having one additional backup would suffice. The second
point is really tricky. I am not sure how the application can decide which
snapshot to revert to. I think in most cases it will be trial and error. It is
not clear to me how to estimate the number of backups needed. Also, it is not
clear how one would go about going back in time. I looked at LogFormatter
utility and that utility does not help much in undoing the erroneous
transactions for case 2 above. In general, I think it is good to enforce users
to have a minimum of one backup.
Related question: Is there hash on the log files (or internal tree structures)
that can tell the ZooKeeper server if the logs are corrupted. If yes, the
zookeeper server can verify the hash during startup and take some action based
on that. For example, make sure that it never becomes a leader until it gets
the correct snapshot from the existing leader (otherwise it may endup
corrupting other server's log). "Corrupting" here refers to the case where the
file is readable, but one or more transactions in the log are bad.
I am not sure if there is a test for this. If I remember correctly, there is a
bug that causes the purge() function to leave behind one addition log file.
Please refer to my question above about findNRecentSnapshots(). I can add a
test or modify the pruge utlity once we have concluded this discussion.
> Small fixes to PurgeTxnLog
> ---------------------------
>
> Key: ZOOKEEPER-872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
> Project: Zookeeper
> Issue Type: Bug
> Affects Versions: 3.3.1
> Reporter: Vishal K
> Assignee: Vishal K
> Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also,
> it prints to stdout instead of using Logger.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.