ok, then let me verify tomorrow if a snapshot file is indeed there. If it is missing then I wonder why it was missing. There was no crash or whatever and 3.4.14 works without issue, but of course it could have loaded them from the log files. However, then I wonder why it does not create one.
On Mon, Jul 29, 2019 at 11:45 PM Michael Han <h...@apache.org> wrote: > >> I just wonder why it does not find a valid snapshot. > > If there are local snapshot files and the files are valid, then it's a bug > that server fails to load them. > > >> Is it because the format changed in 3.5.5 compared to 3.4.14? > > Not I am aware of. There are some format changes (added compression > support) in master branch, but that's not shipped with 3.5.5. > > > > On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfra...@gmail.com> wrote: > > > ok, then it affects basically all standalone nodes? This is fine, despite > > that it means some extra work (for uncritical lab environments). > > I am not sure it is ZOOKEEPER-2325, but I don't know the full history > > behind it).The logs are fine (it works in 3.4.14 without issues, even > after > > downgrading back). There is no issue with disk space and there are no 0 > > byte files. I just wonder why it does not find a valid snapshot. Is it > > because the format changed in 3.5.5 compared to 3.4.14? > > > > On Mon, Jul 29, 2019 at 11:25 PM Michael Han <h...@apache.org> wrote: > > > > > >> java.io.IOException: No snapshot found, but there are log entries. > > > Something is broken! > > > > > > This is expected behavior introduced in ZOOKEEPER-2325. We don't want > to > > > end up with potential inconsistent state across the ensemble when > > > recovering from empty snapshot. > > > > > > To continue upgrade, just delete all txn log files and let the node > sync > > > the snapshot from the quorum. > > > > > > > > > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolive...@gmail.com> > > > wrote: > > > > > > > Il lun 29 lug 2019, 22:32 Jörn Franke <jornfra...@gmail.com> ha > > scritto: > > > > > > > > > It also seems that 3.5.5 does not attempt to read all of the > logfiles > > > (I > > > > > have to still confirm), but the two it reads exist, it has access > and > > > > they > > > > > are much more than 0 byte > > > > > > > > > > > > > We should have the stackstace of the EOFException. > > > > > > > > Anyone on this list has a better idea? > > > > > > > > Enrico > > > > > > > > > > > > > On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <jornfra...@gmail.com > > > > > > wrote: > > > > > > > > > > > (of course i do not run them at the same time) > > > > > > > > > > > > On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke < > jornfra...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > >> thank you for the quick reply. They read from the same disk > paths > > > and > > > > > >> have the same access rights (in fact the RHEL service executes > > them > > > as > > > > > the > > > > > >> same specific user). > > > > > >> > > > > > >> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli < > > > eolive...@gmail.com > > > > > > > > > > >> wrote: > > > > > >> > > > > > >>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfra...@gmail.com> > ha > > > > > scritto: > > > > > >>> > > > > > >>> > Hi, > > > > > >>> > > > > > > >>> > I tried to migrate a lab environment from Zookeepr 3.4.14 > (used > > > for > > > > > >>> Solr) > > > > > >>> > to 3.5.5 and encountered an issue. It is ZooKeeper in > > standalone > > > > mode > > > > > >>> > (other environments have a proper ensemble). I increased > > > > > jute.maxbuffer > > > > > >>> > beyond the default (but not excessively) - this was working > > > > perfectly > > > > > >>> fine > > > > > >>> > in 3.4.14. > > > > > >>> > > > > > > >>> > Basically I reuse for the migration the same config files, > > except > > > > > that > > > > > >>> I > > > > > >>> > whitelist some commands (later I am also interested in adding > > > SSL). > > > > > >>> > > > > > > >>> > I have the following error message when starting Zookeeper > with > > > > 3.5.5 > > > > > >>> > (basically, I just changed the symboling link from zookeeper > to > > > > point > > > > > >>> to > > > > > >>> > 3.5.5 instead of the 3.4.14 directory: > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655] > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b34 > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658] > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b34 > > > > > >>> > 2019-07-29 15:16:25,222 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696] > > > > > >>> > - EOF exception java.io.EOFException: Failed to read > > > > > >>> > /zookeeper/version-2/log.b34 > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655] > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b72 > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658] > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b72 > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - DEBUG > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696] > > > > > >>> > - EOF exception java.io.EOFException: Failed to read > > > > > >>> > /zookeeper/version-2/log.b72 > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - ERROR > > > > [main:ZooKeeperServerMain@83 > > > > > ] > > > > > >>> - > > > > > >>> > Unexpected exception, exiting abnormally > > > > > >>> > java.io.IOException: No snapshot found, but there are log > > > entries. > > > > > >>> > Something is broken! > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > > > > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) > > > > > >>> > at > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) > > > > > >>> > > > > > > >>> > Strangely enough, if I switch back to 3.4.14 the issue is > > > resolved > > > > > and > > > > > >>> > Zookeeper works normally. However, I would like to leverage > the > > > new > > > > > >>> version > > > > > >>> > 3.5.5. > > > > > >>> > > > > > > >>> > There are no 0 bytes files. Disk space is plenty available. > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> Can you compare these logs with logs of 3.4.x ? Are they > reading > > > > from > > > > > >>> the > > > > > >>> same disk paths? > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > Any idea beyond erasing the data dir (I would try to avoid > it, > > I > > > > can > > > > > >>> > reconstruct it, but still)? I will try also in the other > > > > > environments > > > > > >>> and > > > > > >>> > also with an environment with an ensemble, but i would like > to > > > know > > > > > >>> before > > > > > >>> > what the issue could be. > > > > > >>> > > > > > > >>> > Not sure if it is relevant, but: > > > > > >>> > Activated Kerberos Authentication and Kerberos SSL for > clients > > > and > > > > > >>> quorum. > > > > > >>> > > > > > > >>> > > > > > >>> Quorum? In standalone mode there is no 'quorum' auth > > > > > >>> > > > > > >>> Enrico > > > > > >>> > > > > > >>> > > > > > > >>> > > > > > >> > > > > > > > > > > > > > > >