Thanks for the info, I’m still looking. So, this is an Ubuntu packaged version of ZooKeeper.
Andor > On 2019. Aug 27., at 14:13, Debraj Manna <subharaj.ma...@gmail.com> wrote: > > No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2 > > I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see > zookeeper is getting started with 3.4.13 as shown below . The complete logs > are placed in the below gist > > https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9 > > nohup java -Dzookeeper.datadir.autocreate=false > -Dzookeeper.log.dir=/var/log/zookeeper > -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp > '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*' > -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > org.apache.zookeeper.server.quorum.QuorumPeerMain > /etc/zookeeper/conf/zoo.cfg > + sleep 1 > + echo STARTED > STARTED > > The content of zookeeper.log is placed in the below gist after the start > > https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6 > > Let me know if you need any more logs. > > On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <an...@apache.org> wrote: > >> I confirmed that the fix is included in 3.4.13. That’s why I asked if you >> can see ‘updatingEpoch’ file in the data folder. >> >> I don’t think the issue is not related, but I want to make sure that >> you’re running the right version by verifying the beginning of ZK logs. >> >> Andor >> >> >> >>> On 2019. Aug 26., at 13:43, Debraj Manna <subharaj.ma...@gmail.com> >> wrote: >>> >>> Below is the content of currentEpoch.tmp >>> >>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch >>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch >>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat >> currentEpoch.tmp >>> 8support@platform2 >>> >>> Starting zookeeper logs are rolled over as the issue was there for some >>> time. Will the current log with the node in this state help? Btw why do >> you >>> think this issue may not be related to zookeeper? >>> >>> >>> >>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote: >>> >>>> Hi Debraj, >>>> >>>> The fix should be in all 3.4 versions from 3.4.6 onward, including >> 3.4.13. >>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ? >>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to >> ZooKeeper. >>>> >>>> Would you please share full startup logs of the failing node? >>>> >>>> Regards, >>>> Andor >>>> >>>> >>>> >>>> >>>>> On 2019. Aug 23., at 18:53, Debraj Manna <subharaj.ma...@gmail.com> >>>> wrote: >>>>> >>>>> Can someone answer by below query? >>>>> >>>>> I am getting confused after going through ZOOKEEPER-1653 >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and >>>> ZOOKEEPER-2354 >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues >> say >>>> it >>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in >>>> 3.4.13 >>>>> also. Can someone let me know if the issue is present in 3.4.13 also? >>>>> >>>>> >>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <subharaj.ma...@gmail.com> >>>>> wrote: >>>>> >>>>>> With the other two zookeeper servers running I stopped the zookeeper >> in >>>>>> the broken node and the deleted all the contents inside >>>> /var/lib/zookeeper/version-2 >>>>>> and started the zookeeper back on the node. It is running fine now and >>>> got >>>>>> all the data from the other servers. >>>>>> >>>>>> I am getting confused after going through ZOOKEEPER-1653 >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and >>>> ZOOKEEPER-2354 >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues >> say >>>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in >>>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 >>>> also? >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna < >> subharaj.ma...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for replying. >>>>>>> >>>>>>> What is the recommended way to remove a node and delete all data from >>>> it >>>>>>> and make it start fresh? >>>>>>> >>>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eolive...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> Sorry for so late reply. >>>>>>>> If you have 3 servers you can nuke the broken one and make it start >>>> from >>>>>>>> scratch, it will join the cluster and then recover data from the >> other >>>>>>>> servers >>>>>>>> >>>>>>>> Try it in a staging env, not in production >>>>>>>> >>>>>>>> Enrico >>>>>>>> >>>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <subharaj.ma...@gmail.com> >> ha >>>>>>>> scritto: >>>>>>>> >>>>>>>>> The same has been asked in stackoverflow >>>>>>>>> < >>>>>>>>> >>>>>>>> >>>> >> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid >>>>>>>>>> >>>>>>>>> also. But no response there also. >>>>>>>>> >>>>>>>>> Anyone any thoughts on this one? >>>>>>>>> >>>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna < >>>> subharaj.ma...@gmail.com >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Posted wrong Jira link. I meant >>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can >> someone >>>>>>>> let >>>>>>>>> me >>>>>>>>>> know what is the recommended way to recover the node? >>>>>>>>>> >>>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>>>>>> acceptedEpoch >>>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>>>>>> currentEpoch >>>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>>>>>>> currentEpoch.tmp >>>>>>>>>> 8support@platform2 >>>>>>>>>> >>>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna < >>>>>>>> subharaj.ma...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. >>>> Sometimes >>>>>>>>>>> after reboot of machine zookeeper is not starting and I am seeing >>>>>>>> the >>>>>>>>> below >>>>>>>>>>> errors in logs. >>>>>>>>>>> >>>>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 >> . >>>>>>>> Can >>>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the >>>>>>>> issue >>>>>>>>>>> still open? Also can somone suggest what is the recommended way >> to >>>>>>>>> recover >>>>>>>>>>> the set-up ? >>>>>>>>>>> >>>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - >>>>>>>> Unable >>>>>>>>>>> to load database on disk >>>>>>>>>>> java.io.IOException: The current epoch, 7, is older than the last >>>>>>>> zxid, >>>>>>>>>>> 34359738370 >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) >>>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92 >> ] >>>> - >>>>>>>>>>> Unexpected exception, exiting abnormally >>>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) >>>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older >> than >>>>>>>> the >>>>>>>>>>> last zxid, 34359738370 >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) >>>>>>>>>>> ... 4 more---- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>> >>>> >> >>