Hi Vishal, Thanks for picking this up. My comments are inline:
On 9/2/10 3:31 PM, "Vishal K" <vishalm...@gmail.com> wrote: > Hi All, > > I had posted this message as a comment for ZOOKEEPER-822. I thought it might > be a good idea to give a wider attention so that it will be easier to > collect feedback. > > I found few problems in the FLE implementation while debugging for: > https://issues.apache.org/jira/browse/ZOOKEEPER-822. Following the email > below might require some background. If necessary, please browse the JIRA. I > have a patch for 1. a) and 2). I will send them out soon. > > 1. Blocking connects and accepts: > > a) The first problem is in manager.toSend(). This invokes connectOne(), > which does a blocking connect. While testing, I changed the code so that > connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() > does a socketChannel.connect(). After starting AsyncConnect, connectOne > starts a timer. connectOne continues with normal operations if the > connection is established before the timer expires, otherwise, when the > timer expires it interrupts AsyncConnect() thread and returns. In this way, > I can have an upper bound on the amount of time we need to wait for connect > to succeed. Of course, this was a quick fix for my testing. Ideally, we > should use Selector to do non-blocking connects/accepts. I am planning to do > that later once we at least have a quick fix for the problem and consensus > from others for the real fix (this problem is big blocker for us). Note that > it is OK to do blocking IO in SenderWorker and RecvWorker threads since they > block IO to the respective peer. Vishal, I am really concerned about starting up new threads in the server. We really need a total revamp of this code (using NIO and selector). Is the quick fix really required. Zookeeper servers have been running in production for a while, and this problem hasn't been noticed by anyone. Shouldn't we fix it with NIO then? > > b) The blocking IO problem is not just restricted to connectOne(), but also > in receiveConnection(). The Listener thread calls receiveConnection() for > each incoming connection request. receiveConnection does blocking IO to get > peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the > peer that had sent the connection request. All of this is happening from the > Listener. In short, if a peer fails after initiating a connection, the > Listener thread won't be able to accept connections from other peers, > because it would be stuck in read() or connetOne(). Also the code has an > inherent cycle. initiateConnection() and receiveConnection() will have to be > very carefully synchronized otherwise, we could run into deadlocks. This > code is going to be difficult to maintain/modify. > > 2. Buggy senderWorkerMap handling: > The code that manages senderWorkerMap is very buggy. It is causing multiple > election rounds. While debugging I found that sometimes after FLE a node > will have its sendWorkerMap empty even if it has SenderWorker and RecvWorker > threads for each peer. IT would be great to clean it up!! I'd be happy to see this class be cleaned up! :) > > a) The receiveConnection() method calls the finish() method, which removes > an entry from the map. Additionally, the thread itself calls finish() which > could remove the newly added entry from the map. In short, receiveConnection > is causing the exact condition that you mentioned above. > > b) Apart from the bug in finish(), receiveConnection is making an entry in > senderWorkerMap at the wrong place. Here's the buggy code: > SendWorker vsw = senderWorkerMap.get(sid); > senderWorkerMap.put(sid, sw); > if(vsw != null) > vsw.finish(); > It makes an entry for the new thread and then calls finish, which causes the > new thread to be removed from the Map. The old thread will also get > terminated since finish() will interrupt the thread. > > 3. Race condition in receiveConnection and initiateConnection: > > *In theory*, two peers can keep disconnecting each other's connection. > > Example: > T0: Peer 0 initiates a connection (request 1) > T1: Peer 1 receives connection from > peer 0 > T2: Peer 1 calls receiveConnection() > T2: Peer 0 closes connection to Peer 1 because its ID is lower. > T3: Peer 0 re-initiates connection to Peer 1 from manger.toSend() (request > 2) > T3: Peer 1 terminates older connection > to peer 0 > T4: Peer 1 calls connectOne() which > starts new sendWorker threads for peer 0 > T5: Peer 1 kills connection created in > T3 because it receives another (request 2) connect request from 0 > > The problem here is that while Peer 0 is accepting a connection from Peer 1 > it can also be initiating a connection to Peer 1. So if they hit the right > frequencies they could sit in a connect/disconnect loop and cause multiple > rounds of leader election. > > I think the cause here is again blocking connects()/accepts(). A peer starts > to take action (to kill existing threads and start new threads) as soon as a > connection is established at the* *TCP level. That is, it does not give us > any control to synchronized connect and accepts. We could use non-blocking > connects and accepts. This will allow us to a) tell a thread to not initiate > a connection because the listener is about to accept a connection from the > remote peer (use isAcceptable() and isConnectable()methods of SelectionKey) > and b) prevent a thread from initiating multiple connect request to the same > peer. It will simplify synchronization. > > Any thoughts? > I am all for cleaning up the code :). Thanks mahadev > -Vishal >