On cluster startup, master/rs connect to ZK before it's fully ready causing a 
ConnectionLossException
-----------------------------------------------------------------------------------------------------

                 Key: HBASE-2971
                 URL: https://issues.apache.org/jira/browse/HBASE-2971
             Project: HBase
          Issue Type: Bug
          Components: zookeeper
    Affects Versions: 0.90.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.90.0


There is a race condition that has existed but has been glossed over to this 
point (because of our "loose" zk usage).

The ZK server process can be in a state where it will accept the socket 
connection from our client in master or RS but if we do anything against the 
server, we get a ConnectionLossException.  The ZK client handles this 
automagically and reconnects properly, as long as we are not aborting when we 
get this exception.

So this works on the last 0.89 and even with the master rewrite, but as we move 
towards strict usage of ZK, we should wait for ZK availability before 
proceeding with startup.

I already have a patch in a local branch and it's working.  Will put up a patch 
soon against new master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to