Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Jerry Hebert
This is really useful discussion, I really appreciate it! I'm not too worried about the restarts that I saw and they are totally unrelated to the upgrade. The upgrade is only relevant insofar as I was seeking confidence that I would not see the issue once upgraded to 3.5.5 but I'm inclined to

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Enrico Olivelli
I think it is possible to perform a rolling upgrade from 3.4, all of my customers migrated one year ago and without any issue (reported to my team). Norbert, where did you find that information? btw I would like to setup tests about backward compatibility, server-to-server and client-to-server

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Jörn Franke
I tried only from 3.4.14 and there it was possible. I recommend first to upgrade to the latest 3.4 version and then to 3.5 > Am 02.10.2019 um 21:40 schrieb Jerry Hebert : > > Hi Jörn, > > No, this was a very intermittent issue. We've been running this ensemble > for about four years now and

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Jörn Franke
I can confirm that a rolling update from Zk 3.4 to ZK 3.5 is possible if and only if a ZK ensemble is used. standalone updates may introduce difficulties. Of course I cannot tell for all possible setups, but for a ZK ensemble with multiple Solr instances it is possible. > Am 03.10.2019 um

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Shawn Heisey
On 10/3/2019 2:45 AM, Norbert Kalmar wrote: As for running a mixed version of 3.5 and 3.4 quorum - I'm afraid it will not work. From 3.5 we have a check on PROTOCOL_VERSION. 3.4 did not have this protocol version, so when the nodes try to communicate it will throw an exception. Plus, it is not a

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Norbert Kalmar
Hi, Here are the issues we encountered so far upgrading to 3.5.5 from 3.4: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ As Enrico mentioned, nothing similar so far. One is no snapshot taken yet the other is 4 letter words needs to be whitelisted. As for running a mixed

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Enrico Olivelli
Il mer 2 ott 2019, 22:52 Jerry Hebert ha scritto: > Hi Enrico, > > The nodes that restarted did not have any errors in their logs, they seemed > to simply restart successfully so I think your hunch about the external > system is probably correct. > > Could you comment on my second question above

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Jerry Hebert
Hi Enrico, The nodes that restarted did not have any errors in their logs, they seemed to simply restart successfully so I think your hunch about the external system is probably correct. Could you comment on my second question above regarding cross-version migration or should I make a new

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Enrico Olivelli
Any particular error/stacktrace in the logs? If it is zookeeper that is self killing it should log it, otherwise is some other external system, I am sorry I don't know Exhibitor Hope that helps Enrico Il mer 2 ott 2019, 21:40 Jerry Hebert ha scritto: > Hi Jörn, > > No, this was a very

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Jerry Hebert
Hi Jörn, No, this was a very intermittent issue. We've been running this ensemble for about four years now and have never seen this problem so it seems to be super heisenbuggy. Our upgrade process will be more involved than what you described (we're switching networks, instance types, underlying

Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Jörn Franke
Have you tried to stop the node, delete the data and log directory, upgrade to 3.5.5 , start the node and wait until it is synchronized ? > Am 02.10.2019 um 20:14 schrieb Jerry Hebert : > > Hi all, > > My first post here! I'm hoping you all might be able to offer some guidance > or redirect

One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-02 Thread Jerry Hebert
Hi all, My first post here! I'm hoping you all might be able to offer some guidance or redirect me to an existing ticket. We have a five node ensemble on 3.4.11 that we're currently in the process of upgrading to 3.5.5. We recently saw some bizarre behavior in our ensemble that I was hoping to