ADDROLE times out after 180 seconds. This seems to be an unrecoverable state for the cluster, so that is a pretty serious bug.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 21, 2019, at 4:10 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > We have a 6.6.2 cluster in prod that appears to have no overseer. In > /overseer_elect on ZK, there is an election folder, but no leader document. > An OVERSEERSTATUS request fails with a timeout. > > I’m going to try ADDROLE, but I’d be delighted to hear any other ideas. We’ve > diverted all the traffic to the backing cluster, so we can blow this one away > and rebuild. > > Looking at the Zookeeper logs, I see a few instances of network failures > across all three nodes. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) >