Worked with Fusion and Zookeeper at GSA for 18 months: admin role.

Before blowing it away, you could try:

- id a candidate node, with a snapshot you just might think is old enough
to be robust.
- clean data for zk nodes otherwise.
- bring up the chosen node and wait for it to settle[wish i could remember
why i called what i saw that]
- bring up other nodes 1 at a time.  let each one fully sync to follower of
the new leader.
- they should each in turn request the snapshot from the lead. then you
have

: align your collections with the ensemble. and for the life of me i can't
remember there being anything particularly tricky about that with fusion ,
which means I can't remember what I did... or have it doc'd at home. ;-)


Will Martin
DEVOPS ENGINEER
540.454.9565

8609 WESTWOOD CENTER DR, SUITE 475
VIENNA, VA 22182
geturgently.com


On Tue, May 21, 2019 at 11:40 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> Yes, please. I have the logs from each of the Zookeepers.
>
> We are running 3.4.12.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 21, 2019, at 6:49 PM, Will Martin <wmar...@urgent.ly> wrote:
> >
> > Walter. Can I cross-post to zk-dev?
> >
> >
> >
> > Will Martin
> > DEVOPS ENGINEER
> > 540.454.9565
> >
> > <urgently-email-logo>
> >
> > 8609 WESTWOOD CENTER DR, SUITE 475
> > VIENNA, VA 22182
> > geturgently.com <http://geturgently.com/>
> >
> >
> >
> >
> >> On May 21, 2019, at 9:26 PM, Will Martin <wmar...@urgent.ly <mailto:
> wmar...@urgent.ly>> wrote:
> >>
> >> +1
> >>
> >> Will Martin
> >> DEVOPS ENGINEER
> >> 540.454.9565
> >>
> >> 8609 WESTWOOD CENTER DR, SUITE 475
> >> VIENNA, VA 22182
> >> geturgently.com <http://geturgently.com/>
> >>
> >>
> >> On Tue, May 21, 2019 at 7:39 PM Walter Underwood <wun...@wunderwood.org
> <mailto:wun...@wunderwood.org>> wrote:
> >> ADDROLE times out after 180 seconds. This seems to be an unrecoverable
> state for the cluster, so that is a pretty serious bug.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my
> blog)
> >>
> >> > On May 21, 2019, at 4:10 PM, Walter Underwood <wun...@wunderwood.org
> <mailto:wun...@wunderwood.org>> wrote:
> >> >
> >> > We have a 6.6.2 cluster in prod that appears to have no overseer. In
> /overseer_elect on ZK, there is an election folder, but no leader document.
> An OVERSEERSTATUS request fails with a timeout.
> >> >
> >> > I’m going to try ADDROLE, but I’d be delighted to hear any other
> ideas. We’ve diverted all the traffic to the backing cluster, so we can
> blow this one away and rebuild.
> >> >
> >> > Looking at the Zookeeper logs, I see a few instances of network
> failures across all three nodes.
> >> >
> >> > wunder
> >> > Walter Underwood
> >> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
> (my blog)
> >> >
> >>
> >
>
>

Reply via email to