Hi Team,

I have done some POC on rolling upgrade and found below result.


   1. On 1st node upgrade zookeeper . Traffic was running fine because 2
   nodes are already on old zookeeper.
   2. On 1st node upgrade our application and didn’t find any issue
   3. On 2nd node upgrade zookeeper but got below error and zookeeper is
   not taking any requests
   4.

java.io.EOFException

        at java.io.DataInputStream.readInt(DataInputStream.java:392)

        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)

2020-03-30 14:19:55,587 - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker

2020-03-30 14:19:55,588 - ERROR [LearnerHandler-/192.168.44.73:33754
:LearnerHandler@562] - Unexpected exception causing shutdown while sock
still open

java.io.EOFException

        at java.io.DataInputStream.readInt(DataInputStream.java:392)

        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)

        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)

        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)

        at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)

2020-03-30 14:19:55,588 - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
for message on queue

Please let me know is this the known issue or this is different issue which
is mention in Apache zookeeper documentation when upgrading from 3.4.5 to
3.5.6

Thanks,
---------------------
Kuldeep Singh Budania
Software Architect



On Sun, Mar 29, 2020 at 9:06 AM Alexander Shraer <shra...@gmail.com> wrote:

> +1 to what Mate said (I wrote the quoted instructions).
>
>
>
> On Tue, Mar 24, 2020 at 7:03 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > Hi Kuldeep,
> >
> > I just want to provide you some background info about our documentation.
> > The reason to upgrade to 3.4.6 first is to avoid the following error:
> >
> > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784
> > :QuorumCnxManager@349] - Invalid server id: -65536
> >
> > This error comes because of the protocol changes between ZooKeeper server
> > nodes during connection initiation for leader election. In ZooKeeper 3.5
> a
> > protocol version was introduced (see ZOOKEEPER-107) and since that time
> the
> > fist long value sent in the initial message is not the server ID but the
> > protocol version (-65536). In ZooKeeper 3.4.6 we made the old 3.4
> > ZooKeepers backward compatible, so they are able to parse both the old
> and
> > the new protocol format (see ZOOKEEPER-1633). This issue happens only
> when
> > you need to use old (3.4.0 - 3.4.5) and new (3.5.0+) ZooKeeper servers
> > together in the same cluster. During a rolling upgrade, this is usually
> the
> > case to have old and new ZooKeepers present together.
> >
> > The fact that you haven't seen any issues might be caused by the order of
> > the servers. In ZooKeeper the connection initiation between the servers
> > during the leader election follows a specific rule. As far as I remember
> > always the server with the larger ID 'wins the challenge', so it is
> > possible, that the old server didn't need to parse any initial message
> (if
> > it had the largest ID) and this is why you haven't seen the issue. Also
> > having 2 nodes up from the 3 nodes cluster still makes the cluster work
> (so
> > you should also check if all the servers are part of the quorum).
> >
> > I agree with Enrico and Norbert, the safest and most stable way is
> upgrade
> > first to 3.4.latest, then go to 3.5.latest. Still, if you don't see that
> > you would hit this specific issue (e.g. no "Invalid server id" in the log
> > files), and all the three servers can handle traffic, then maybe you
> don't
> > need to upgrade first to 3.4.latest, it is your decision. Definitely you
> > should test it first, as suggested by the others.
> >
> > Kind regards,
> > Mate
> >
> > On Tue, Mar 24, 2020 at 12:29 PM Norbert Kalmar
> > <nkal...@cloudera.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > That guide is to upgrade to 3.5.0, which was an alpha version. A lot
> has
> > > changed for the first stable release of 3.5.5 and then a few more, even
> > > rolling upgrade issues have been fixed for 3.5.6.
> > > This is a more up-to-date guide:
> > > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
> > >
> > > If you have done your testing (with prod snapshot!), then you can skip
> > 3.4
> > > latest upgrade, but keep in mind we do our recommendations for a
> reason.
> > > There were issues reported and/or found during testing. Some are fixed
> > with
> > > 3.5.6, some only happens if certain conditions stand (IOException: No
> > > snapshot found - mentioned in the guide, fixed in 3.5.6).
> > >
> > > So it is up to you, I would still recommend to do an 3.4 upgrade first,
> > if
> > > it's feasible.
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Tue, Mar 24, 2020 at 11:45 AM kuldeep singh <
> > kuldeep.sing...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Current Zookeeper version :- 3.4.5
> > > > Upgraded version                :- 3.5.6
> > > >
> > > > We are not going with 3.5.7. Our final decision is zookeeper version
> is
> > > > 3.5.6
> > > > as per your reply first we need to move latest version of 3.4.x, like
> > > below
> > > >
> > > > 3.4.5 -> 3.4.14 -> 3.5.6 (Correct me if I am wrong here)
> > > >
> > > > But if We are not facing any problem that i have shared you that we
> > have
> > > > set up of 3 node cluster where 2 node are on 3.5.6 version and 1 node
> > on
> > > > 3.4.5, Everything is running fine and didn't get any issue, So what
> > other
> > > > problem we can face if we directly move to 3.5.6
> > > >
> > > > Thanks,
> > > > ---------------------
> > > > Kuldeep Singh Budania
> > > > Software Architect
> > > >
> > > >
> > > > On Tue, Mar 24, 2020 at 3:58 PM Enrico Olivelli <eolive...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi
> > > > > You have to upgrade to latest 3.4.x Zookeeper then you will upgrade
> > to
> > > > > 3.5.7.
> > > > > All should run well without issues
> > > > >
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il Mar 24 Mar 2020, 10:18 kuldeep singh <kuldeep.sing...@gmail.com
> >
> > ha
> > > > > scritto:
> > > > >
> > > > > > Hi Team,
> > > > > >
> > > > > > We are upgrading zookeeper from 3.4.5 to 3.5.6. I have set up 3
> > node
> > > > > > cluster where 2 node are on 3.5.6 version and 1 node on 3.4.5.
> > > > > >
> > > > > > Everything is running fine and didn't get any issue on my system.
> > > > > >
> > > > > > but I found something on apache site  that first we need to
> upgrade
> > > on
> > > > > > 3.4.6 than we can upgrade to 3.5.6. So is it mandatory  to go on
> > > 3.4.6
> > > > > > first.
> > > > > >
> > > > > > *Upgrading to 3.5.0*
> > > > > >
> > > > > > Upgrading a running ZooKeeper ensemble to 3.5.0 should be done
> only
> > > > after
> > > > > > upgrading your ensemble to the 3.4.6 release. Note that this is
> > only
> > > > > > necessary for rolling upgrades (if you're fine with shutting down
> > the
> > > > > > system completely, you don't have to go through 3.4.6). If you
> > > attempt
> > > > a
> > > > > > rolling upgrade without going through 3.4.6 (for example from
> > 3.4.5),
> > > > you
> > > > > > may get the following error:
> > > > > >
> > > > > > 2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/
> 127.0.0.1:2784
> > > > > > :QuorumCnxManager$Listener@498] - Received connection request /
> > > > > > 127.0.0.1:60876
> > > > > >
> > > > > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/
> 127.0.0.1:2784
> > > > > > :QuorumCnxManager@349] - Invalid server id: -65536
> > > > > >
> > > > > > During a rolling upgrade, each server is taken down in turn and
> > > > rebooted
> > > > > > with the new 3.5.0 binaries. Before starting the server with
> 3.5.0
> > > > > > binaries, we highly recommend updating the configuration file so
> > that
> > > > all
> > > > > > server statements "server.x=..." contain client ports (see the
> > > section
> > > > > > Specifying
> > > > > > the client port). As explained earlier you may leave the
> > > configuration
> > > > > in a
> > > > > > single file, as well as leave the clientPort/clientPortAddress
> > > > statements
> > > > > > (although if you specify client ports in the new format, these
> > > > statements
> > > > > > are now redundant).
> > > > > >
> > > > > > Could you please let me know about this case. Appreciate if
> respond
> > > > soon.
> > > > > >
> > > > > > Thanks,
> > > > > > ---------------------
> > > > > > Kuldeep Singh Budania
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to