I might be wrong here, but let me try to chip in my few cents.
I think the problem is in LearnerHandler.java at the leader fo this
Follower.
/* see what other packets from the proposal
* and tobeapplied queues need to be sent
* and then decide if we can just send a DIFF
* or we actually need to send the whole snapshot
*/
long leaderLastZxid = leader.startForwarding(this, updates);
---> this leaderLastZxid returned is probably incorrect.
// a special case when both the ids are the same
if (peerLastZxid == leaderLastZxid) {
packetToSend = Leader.DIFF;
zxidToSend = leaderLastZxid;
}
QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
leaderLastZxid, null, null);
oa.writeRecord(newLeaderQP, "packet");
bufferedOutput.flush()
On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) <
[email protected]> wrote:
>
> [
> https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880320#action_12880320]
>
> Flavio Paiva Junqueira commented on ZOOKEEPER-335:
> --------------------------------------------------
>
> Guys, I don't see enough information in these logs to determine what's
> going on. Let me tell you what I'm seeing so that perhaps other folks can
> help me out here.
>
> One part of the log that is suspicious is this one:
>
> {noformat}
> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
> org.apache.zookeeper.server.quorum.Learner - Got zxid 0x300000001 expected
> 0x1
> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
> org.apache.zookeeper.server.quorum.Learner - Got zxid 0x300000001 expected
> 0x1
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]
>
> ************* NODE RESTARTED HERE **********************
> {noformat}
>
> Before being restarted, the bad node receives a proposal with zxid <3,1>
> and it expects <0,1>. Next in the logs after being restarted, I can see that
> it is complaining that it has epoch 4 and the leader 3. Something strange
> apparently happened during the restart. It also seems to be the case that
> the node was being able to talk to the others (first entries in the log
> before the excerpt above).
>
> Do you guys see anything I'm overlooking?
>
> > zookeeper servers should commit the new leader txn to their logs.
> > -----------------------------------------------------------------
> >
> > Key: ZOOKEEPER-335
> > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
> > Project: Zookeeper
> > Issue Type: Bug
> > Components: server
> > Affects Versions: 3.1.0
> > Reporter: Mahadev konar
> > Assignee: Mahadev konar
> > Priority: Blocker
> > Fix For: 3.4.0
> >
> > Attachments: zk.log.gz, zklogs.tar.gz
> >
> >
> > currently the zookeeper followers do not commit the new leader election.
> This will cause problems in a failure scenarios with a follower acking to
> the same leader txn id twice, which might be two different intermittent
> leaders and allowing them to propose two different txn's of the same zxid.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>