[
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal K updated ZOOKEEPER-880:
-------------------------------
Attachment: ZOOKEEPER-880.patch
The root cause of frequent disconnect needs to be resolved. In the mean time, I
have fixed the problem that was causing the leak of every other thread of
SendWorker.
I tested the patch by connecting to 3888 on one of the servers using telnet.
2010-11-19 14:51:10,081 - INFO
[/10.17.119.101:3888:quorumcnxmanager$liste...@477] - Received connection
request /10.16.251.39:2074
2010-11-19 14:51:14,364 - DEBUG
[/10.17.119.101:3888:quorumcnxmanager$sendwor...@553] - Address of remote peer:
8103510703875099187
2010-11-19 14:51:19,440 - WARN [Thread-7:quorumcnxmanager$recvwor...@726] -
Connection broken for id 8103510703875099187, my id = 1, error =
java.io.IOException: Received packet with invalid packet: 218824692
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:711)
2010-11-19 14:51:19,441 - WARN [Thread-7:quorumcnxmanager$recvwor...@730] -
Interrupting SendWorker <============= SendWorker is getting killed.
2010-11-19 14:51:19,442 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@571] -
Calling finish for 8103510703875099187
2010-11-19 14:51:19,443 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@591] -
Removing entry from senderWorkerMap sid=8103510703875099187
2010-11-19 14:51:19,443 - WARN [Thread-6:quorumcnxmanager$sendwor...@643] -
Interrupted while waiting for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:631)
2010-11-19 14:51:19,456 - DEBUG [Thread-6:quorumcnxmanager$sendwor...@571] -
Calling finish for 8103510703875099187
2010-11-19 14:51:19,457 - WARN [Thread-6:quorumcnxmanager$sendwor...@652] -
Send worker leaving thread
Can you see if this fixes the problem?
> QuorumCnxManager$SendWorker grows without bounds
> ------------------------------------------------
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
> Issue Type: Bug
> Affects Versions: 3.2.2
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz,
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack,
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch
>
>
> We're seeing an issue where one server in the ensemble has a steady growing
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs
> out of native threads, and at the same time we see a lot of exceptions in the
> logs. This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs
> in moment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.