[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745870#action_12745870
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-512:
--------------------------------------------------

Two things:

1- I'm not sure what you've been searching for, so I don't have a pointer, but 
the behavior I expect is that if you get an IOException upon invoking a socket 
operation, then the operation won't be available after that. Am I not 
interpreting it correctly?
2- Visually expecting the logs, I was able to count about 20 successful leader 
elections. In the previous set of logs, I think servers got stuck around 5, so 
I see improvement after you modified your fault injection. Also, according to 
server 5, a leader was elected successfully. Here is the tail of the log of 5: 

{noformat}
2009-08-20 13:43:56,636 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2185:quorump...@508] - LEADING
2009-08-20 13:43:56,636 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2185:zookeeperser...@160] - Created server
2009-08-20 13:43:56,643 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2185:files...@81] 
- Reading snapshot ./localhost:2185/data/version-2/snapshot.1c0000001f
2009-08-20 13:43:56,699 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2185:filetxnsnap...@208] - Snapshotting: 1c0000001f
2009-08-20 13:43:56,844 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,845 - INFO  
[FollowerHandler-/127.0.0.1:55253:followerhand...@227] - Follower sid: 4 : info 
: org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@1b1aa65
2009-08-20 13:43:56,845 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,845 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,846 - WARN  
[FollowerHandler-/127.0.0.1:55253:followerhand...@302] - Sending snapshot last 
zxid of peer is 0x1c0000001f  zxid of leader is 0x1d00000000
2009-08-20 13:43:56,847 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,848 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,848 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,849 - INFO  
[FollowerHandler-/127.0.0.1:55254:followerhand...@227] - Follower sid: 2 : info 
: org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@1ef9157
2009-08-20 13:43:56,849 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,850 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,850 - WARN  
[FollowerHandler-/127.0.0.1:55254:followerhand...@302] - Sending snapshot last 
zxid of peer is 0x1c0000001f  zxid of leader is 0x1d00000000
2009-08-20 13:43:56,851 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:56,851 - WARN  [FollowerHandler-/127.0.0.1:55254:lea...@452] - 
Commiting zxid 0x1d00000000 from /127.0.0.1:3185 not first!
2009-08-20 13:43:56,852 - WARN  [FollowerHandler-/127.0.0.1:55254:lea...@454] - 
First is 0
2009-08-20 13:43:56,852 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:59,434 - INFO  [Thread-536:requestfailsintermitten...@120] - 
SOCKET REQUEST OK
2009-08-20 13:43:59,434 - INFO  [Thread-536:requestfailsintermitten...@120] - 
SOCKET REQUEST OK
2009-08-20 13:43:59,435 - INFO  [WorkerReceiver 
Thread:fastleaderelection$messenger$workerrecei...@254] - Sending new 
notification.
2009-08-20 13:43:59,435 - INFO  [Thread-535:requestfailsintermitten...@120] - 
SOCKET REQUEST OK
2009-08-20 13:43:59,846 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:43:59,848 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:00,765 - INFO  [NIOServerCxn.Factory:2185:nioserverc...@698] - 
Processing stat command from /127.0.0.1:38350
2009-08-20 13:44:00,766 - WARN  [NIOServerCxn.Factory:2185:nioserverc...@494] - 
Exception causing close of session 0x0 due to java.io.IOException: Responded to 
info probe
2009-08-20 13:44:00,766 - INFO  [NIOServerCxn.Factory:2185:nioserverc...@833] - 
closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected 
local=/127.0.0.1:2185 remote=/127.0.0.1:38350]
2009-08-20 13:44:00,846 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:00,848 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:01,847 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:01,848 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:02,847 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:02,848 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:03,847 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:03,849 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:04,847 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:04,849 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:05,847 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:05,849 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:06,849 - INFO  
[FollowerHandler-/127.0.0.1:55253:requestfailsintermitten...@91] - RECORD 
REQUEST OK
2009-08-20 13:44:06,849 - INFO  
[FollowerHandler-/127.0.0.1:55254:requestfailsintermitten...@91] - RECORD 
REQUEST OK
{noformat}

Inspecting the logs a little more, it is true, though, that 1 and 3 are stuck. 
It could be due to the challenge protocol as you suggest, so we should keep it 
in mind. 

> FLE election fails to elect leader
> ----------------------------------
>
>                 Key: ZOOKEEPER-512
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.2.0
>            Reporter: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.2.1, 3.3.0
>
>         Attachments: jst.txt, logs.tar.gz, logs2.tar.gz
>
>
> I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
> applied and noticed that after some time the ensemble failed to re-elect a 
> leader.
> See the attached log files - 5 member ensemble. typically 5 is the leader
> Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
> elapses w/no quorum
> environment:
> I was doing fault injection testing using aspectj. The faults are injected 
> into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
> (rand.nextFloat() <= .005 => throw IOException
> You can see when a fault is injected in the log via:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
> - READPACKET FORCED FAIL
> vs a read/write that didn't force fail:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
> - READPACKET OK
> otw standard code/config (straight fle quorum with 5 members)
> also see the attached jstack trace. this is for one of the servers. Notice in 
> particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to