Ivan, Thanks for the references. Great write up describing the problem and the solution. I agree that for total ordering storage layer should provide the guarantee.
However, in my scenario, I don't have a global/central storage system (trying to build a globally (servers in different continents) distributed storage system), I am minimizing the chances of multiple writers with longer timeouts and eliminating the downtime during planned maintenance, but don't have a way to deal with zookeeper ensemble being DEAD (or in leader election), which wipes out the entire application. Currently, we twiddle our thumbs when zookeeper or network has issues and would like to find a way to improve the application availability during zookeeper service interruptions. Using multiple ensembles seems like a promising path, but I wanted to see if anyone has thought about extending the error handling between zookeeper client/server to deal with this issue. -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581299.html Sent from the zookeeper-user mailing list archive at Nabble.com.
