[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932649#action_12932649 ]
Camille Fournier commented on ZOOKEEPER-922: -------------------------------------------- I'm interested in hearing the problems that you believe it would lead to in more detail. To me, this feels like a reasonable compromise solution to a tough problem. If the problem you foresee is a client and server getting disconnected from each other but both staying alive, and this causing weirdness leading to a session expiration for the client on reconnecting to another server, for my particular scenario that is fine. I have a wrapped ZK client that is highly tolerant to all sorts of failures and has no problem resetting its state. I realize that may not be acceptable for other users, and I would not propose this solution without either community agreement that this risk, if well-documented, is ok, or a fix for that problem. But I don't know what other problems you are seeing and while I might be able to solve them if you help me see what they are, I can't do anything on vague suppositions of problematic circumstances. Don't get me wrong, I'm not married to this solution, but I am interested in some solution if possible. It seems to me that not allowing clients to reconnect to other servers causes a host of other problems and is a worse solution for people that would not want this fast expiration forced on them. In what scenarios can a client not reconnect to another server? All? Obviously that won't fly because even I would not want to have all of my sessions expire in the case of an ensemble member dying and clients failing over. If we only want to do this where my code is doing the "touchAndClose" (ie, when the server the client was connected to sees a failure-based disconnect), then we see exactly the same potential problem outlined above where the client could still be alive but have a switch go down and disconnect it from the server. Now it tries to fail over and its session is always dead. I'm not convinced off the bat that that is any better than letting it try to fail over and risking a potential session timeout race, which I think could possibly be fixed by associating the client session with the server currently maintaining it (already done but not passed through on ticks). What did you mean in the earlier comment about this causing leadership election issues? Does this actually interact with that at all? This is the kind of thing I could use guidance on. Or we can let this whole idea drop, but it does seem that more people than me are interested so might be worth hashing it out. > enable faster timeout of sessions in case of unexpected socket disconnect > ------------------------------------------------------------------------- > > Key: ZOOKEEPER-922 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 > Project: Zookeeper > Issue Type: Improvement > Components: server > Reporter: Camille Fournier > Assignee: Camille Fournier > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-922.patch > > > In the case when a client connection is closed due to socket error instead of > the client calling close explicitly, it would be nice to enable the session > associated with that client to time out faster than the negotiated session > timeout. This would enable a zookeeper ensemble that is acting as a dynamic > discovery provider to remove ephemeral nodes for crashed clients quickly, > while allowing for a longer heartbeat-based timeout for java clients that > need to do long stop-the-world GC. > I propose doing this by setting the timeout associated with the crashed > session to "minSessionTimeout". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.