+1 Will Martin DEVOPS ENGINEER 540.454.9565
8609 WESTWOOD CENTER DR, SUITE 475 VIENNA, VA 22182 geturgently.com On Tue, May 21, 2019 at 7:39 PM Walter Underwood <[email protected]> wrote: > ADDROLE times out after 180 seconds. This seems to be an unrecoverable > state for the cluster, so that is a pretty serious bug. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On May 21, 2019, at 4:10 PM, Walter Underwood <[email protected]> > wrote: > > > > We have a 6.6.2 cluster in prod that appears to have no overseer. In > /overseer_elect on ZK, there is an election folder, but no leader document. > An OVERSEERSTATUS request fails with a timeout. > > > > I’m going to try ADDROLE, but I’d be delighted to hear any other ideas. > We’ve diverted all the traffic to the backing cluster, so we can blow this > one away and rebuild. > > > > Looking at the Zookeeper logs, I see a few instances of network failures > across all three nodes. > > > > wunder > > Walter Underwood > > [email protected] > > http://observer.wunderwood.org/ (my blog) > > > >
