For me the issue occurred only in standalone mode. With the ensemble I simply cleared the data directory and it received the zookeeper data from the quorum.
> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degro...@limecraft.com>: > > I would also like to know if this is possible. > > From going over the github page, it seems there is a JMX method to force > the creation of a snapshot. Yet the docker image is configured as such that > a port will never be assigned to the JMX process. > > Is there any way to bypass this? > >> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfra...@gmail.com> wrote: >> >> Thanks. It is possible to force Zookeeper to create a snapshot? I will >> check I think the snapshot count is set to 1 in the cfg >> >>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolive...@gmail.com>: >>> >>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke < >> jornfra...@gmail.com> >>> ha scritto: >>> >>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If >> it >>>> is missing then I wonder why it was missing. There was no crash or >> whatever >>>> and 3.4.14 works without issue, but of course it could have loaded them >>>> from the log files. However, then I wonder why it does not create one. >>>> >>> >>> >>> >>> I remember now that some other user, I think Sijie, reported a similar >>> problem some month ago, that it is not possible to upgrade from 3.4 to >> 3.5 >>> if no snapshot is present. >>> IIRC The fix was to force the creation of at least one snapshot file and >>> then upgrade >>> >>> Enrico >>> >>> >>>> >>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <h...@apache.org> wrote: >>>> >>>>>>> I just wonder why it does not find a valid snapshot. >>>>> >>>>> If there are local snapshot files and the files are valid, then it's a >>>> bug >>>>> that server fails to load them. >>>>> >>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14? >>>>> >>>>> Not I am aware of. There are some format changes (added compression >>>>> support) in master branch, but that's not shipped with 3.5.5. >>>>> >>>>> >>>>> >>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfra...@gmail.com> >>>> wrote: >>>>> >>>>>> ok, then it affects basically all standalone nodes? This is fine, >>>> despite >>>>>> that it means some extra work (for uncritical lab environments). >>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history >>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even >>>>> after >>>>>> downgrading back). There is no issue with disk space and there are no >> 0 >>>>>> byte files. I just wonder why it does not find a valid snapshot. Is >> it >>>>>> because the format changed in 3.5.5 compared to 3.4.14? >>>>>> >>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <h...@apache.org> wrote: >>>>>> >>>>>>>>> java.io.IOException: No snapshot found, but there are log entries. >>>>>>> Something is broken! >>>>>>> >>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want >>>>> to >>>>>>> end up with potential inconsistent state across the ensemble when >>>>>>> recovering from empty snapshot. >>>>>>> >>>>>>> To continue upgrade, just delete all txn log files and let the node >>>>> sync >>>>>>> the snapshot from the quorum. >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolive...@gmail.com >>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfra...@gmail.com> ha >>>>>> scritto: >>>>>>>> >>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the >>>>> logfiles >>>>>>> (I >>>>>>>>> have to still confirm), but the two it reads exist, it has access >>>>> and >>>>>>>> they >>>>>>>>> are much more than 0 byte >>>>>>>>> >>>>>>>> >>>>>>>> We should have the stackstace of the EOFException. >>>>>>>> >>>>>>>> Anyone on this list has a better idea? >>>>>>>> >>>>>>>> Enrico >>>>>>>> >>>>>>>> >>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke < >>>> jornfra...@gmail.com >>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> (of course i do not run them at the same time) >>>>>>>>>> >>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke < >>>>> jornfra...@gmail.com >>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> thank you for the quick reply. They read from the same disk >>>>> paths >>>>>>> and >>>>>>>>>>> have the same access rights (in fact the RHEL service executes >>>>>> them >>>>>>> as >>>>>>>>> the >>>>>>>>>>> same specific user). >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli < >>>>>>> eolive...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfra...@gmail.com> >>>>> ha >>>>>>>>> scritto: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14 >>>>> (used >>>>>>> for >>>>>>>>>>>> Solr) >>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in >>>>>> standalone >>>>>>>> mode >>>>>>>>>>>>> (other environments have a proper ensemble). I increased >>>>>>>>> jute.maxbuffer >>>>>>>>>>>>> beyond the default (but not excessively) - this was working >>>>>>>> perfectly >>>>>>>>>>>> fine >>>>>>>>>>>>> in 3.4.14. >>>>>>>>>>>>> >>>>>>>>>>>>> Basically I reuse for the migration the same config files, >>>>>> except >>>>>>>>> that >>>>>>>>>>>> I >>>>>>>>>>>>> whitelist some commands (later I am also interested in >>>> adding >>>>>>> SSL). >>>>>>>>>>>>> >>>>>>>>>>>>> I have the following error message when starting Zookeeper >>>>> with >>>>>>>> 3.5.5 >>>>>>>>>>>>> (basically, I just changed the symboling link from >>>> zookeeper >>>>> to >>>>>>>> point >>>>>>>>>>>> to >>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory: >>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>>>> /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>>>> /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR >>>>>>>> [main:ZooKeeperServerMain@83 >>>>>>>>> ] >>>>>>>>>>>> - >>>>>>>>>>>>> Unexpected exception, exiting abnormally >>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log >>>>>>> entries. >>>>>>>>>>>>> Something is broken! >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) >>>>>>>>>>>>> >>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is >>>>>>> resolved >>>>>>>>> and >>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage >>>>> the >>>>>>> new >>>>>>>>>>>> version >>>>>>>>>>>>> 3.5.5. >>>>>>>>>>>>> >>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Can you compare these logs with logs of 3.4.x ? Are they >>>>> reading >>>>>>>> from >>>>>>>>>>>> the >>>>>>>>>>>> same disk paths? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid >>>>> it, >>>>>> I >>>>>>>> can >>>>>>>>>>>>> reconstruct it, but still)? I will try also in the other >>>>>>>>> environments >>>>>>>>>>>> and >>>>>>>>>>>>> also with an environment with an ensemble, but i would like >>>>> to >>>>>>> know >>>>>>>>>>>> before >>>>>>>>>>>>> what the issue could be. >>>>>>>>>>>>> >>>>>>>>>>>>> Not sure if it is relevant, but: >>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for >>>>> clients >>>>>>> and >>>>>>>>>>>> quorum. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth >>>>>>>>>>>> >>>>>>>>>>>> Enrico >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>