[
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933602#action_12933602
]
Vishal K commented on ZOOKEEPER-880:
------------------------------------
Hi Benoit,
May I suggest to see if you can reproduce this problem with 3.3.3
(with patch for ZOOKEEPER-822)? I was going through
QuorumCnxManager.java for 3.2.2. It clearly leaks a SendWorker thread
for every other connection.
After receiving a connection from a peer, it creates a new thread and
inserts its reference in senderWorkerMap.
SendWorker sw = new SendWorker(s, sid);
RecvWorker rw = new RecvWorker(s, sid);
sw.setRecv(rw);
SendWorker vsw = senderWorkerMap.get(sid);
senderWorkerMap.put(sid, sw);
Then it kills the old thread for the peer (created from earlier
connection)
if(vsw != null)
vsw.finish();
However, the SendWorker.finish method removes an entry from
senderWorkerMap. This results in removing a reference for
recently created SendWorker thread.
senderWorkerMap.remove(sid);
Thus, it will end up removing both the entries. As a result, one thread
will be leaked for every other connection.
If you count the number of error messages in
hbase-hadoop-zookeeper-sv4borg9.log, you will see that messages from
RecvWorker is approximately twice of SendWorker. I think this proves
the point.
$:/tmp/hadoop # grep "RecvWorker" hbase-hadoop-zookeeper-sv4borg9.log | wc -l
60
$:/tmp/hadoop # grep "SendWorker" hbase-hadoop-zookeeper-sv4borg9.log | wc -l
32
-Vishal
> QuorumCnxManager$SendWorker grows without bounds
> ------------------------------------------------
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
> Issue Type: Bug
> Affects Versions: 3.2.2
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz,
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack,
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs
> out of native threads, and at the same time we see a lot of exceptions in the
> logs. This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs
> in moment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.