ADDROLE times out after 180 seconds. This seems to be an unrecoverable state 
for the cluster, so that is a pretty serious bug.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 21, 2019, at 4:10 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> We have a 6.6.2 cluster in prod that appears to have no overseer. In 
> /overseer_elect on ZK, there is an election folder, but no leader document. 
> An OVERSEERSTATUS request fails with a timeout.
> 
> I’m going to try ADDROLE, but I’d be delighted to hear any other ideas. We’ve 
> diverted all the traffic to the backing cluster, so we can blow this one away 
> and rebuild.
> 
> Looking at the Zookeeper logs, I see a few instances of network failures 
> across all three nodes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 

Reply via email to