[ https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517245#comment-17517245 ]
Mate Szalay-Beko edited comment on ZOOKEEPER-3652 at 4/5/22 6:33 AM: --------------------------------------------------------------------- Great catch, thanks for the fix! Is there a reason why not to backport this to branch-3.5? (it is an important bugfix and 3.5 is not EoL yet) was (Author: symat): Great catch, thanks for the fix! Is there a reason why not to backport this to branch-3.5? (it is an important bugfix and c.5 is not EoL yet) > Improper synchronization in ClientCnxn > -------------------------------------- > > Key: ZOOKEEPER-3652 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.5.6 > Reporter: Sylvain Wallez > Priority: Major > Labels: pull-request-available > Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in > {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in > {{outgoingQueue}} and draining it when the client connection isn't alive. > There are several issues with this approach: > - the value of the {{state}} field is not stable, meaning we don't always > synchronize on the same object. > - the {{state}} field is an enum value, which are global objects. So in an > application with several ZooKeeper clients connected to different servers, > this causes some contention between clients. > An easy fix is change those {{synchronized(state)}} statements to > {{synchronized(outgoingQueue)}} since it is local to each client and is what > we want to coordinate. > I'll be happy to prepare a PR with the above change if this is deemed to be > the correct way to fix it. > > Another issue that makes contention worse is > {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above > synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep > statement needed, and can we remove it? > -- This message was sent by Atlassian Jira (v8.20.1#820001)