Yes, please. I have the logs from each of the Zookeepers. We are running 3.4.12.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 21, 2019, at 6:49 PM, Will Martin <wmar...@urgent.ly> wrote: > > Walter. Can I cross-post to zk-dev? > > > > Will Martin > DEVOPS ENGINEER > 540.454.9565 > > <urgently-email-logo> > > 8609 WESTWOOD CENTER DR, SUITE 475 > VIENNA, VA 22182 > geturgently.com <http://geturgently.com/> > > > > >> On May 21, 2019, at 9:26 PM, Will Martin <wmar...@urgent.ly >> <mailto:wmar...@urgent.ly>> wrote: >> >> +1 >> >> Will Martin >> DEVOPS ENGINEER >> 540.454.9565 >> >> 8609 WESTWOOD CENTER DR, SUITE 475 >> VIENNA, VA 22182 >> geturgently.com <http://geturgently.com/> >> >> >> On Tue, May 21, 2019 at 7:39 PM Walter Underwood <wun...@wunderwood.org >> <mailto:wun...@wunderwood.org>> wrote: >> ADDROLE times out after 180 seconds. This seems to be an unrecoverable state >> for the cluster, so that is a pretty serious bug. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my blog) >> >> > On May 21, 2019, at 4:10 PM, Walter Underwood <wun...@wunderwood.org >> > <mailto:wun...@wunderwood.org>> wrote: >> > >> > We have a 6.6.2 cluster in prod that appears to have no overseer. In >> > /overseer_elect on ZK, there is an election folder, but no leader >> > document. An OVERSEERSTATUS request fails with a timeout. >> > >> > I’m going to try ADDROLE, but I’d be delighted to hear any other ideas. >> > We’ve diverted all the traffic to the backing cluster, so we can blow this >> > one away and rebuild. >> > >> > Looking at the Zookeeper logs, I see a few instances of network failures >> > across all three nodes. >> > >> > wunder >> > Walter Underwood >> > wun...@wunderwood.org <mailto:wun...@wunderwood.org> >> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my >> > blog) >> > >> >