Well, I think it's fair to say that they're effectively untested prior to
rc2. But it's a reasonable posture to take that the features get baked
first and the field upgrade procedure gets tested late in the release
cycle. Not what I would have expected personally, though, as a former
developer of
FYI: PR just submitted, see https://github.com/apache/zookeeper/pull/1251
any comments welcomed! :)
Kind regards,
Mate
On Wed, Feb 12, 2020 at 1:16 PM Andor Molnar wrote:
> Hi Michael,
>
> "if we can get to rc2 without noticing a showstopper…”
>
> 200% disagree with this.
>
> The whole point o
Hi Michael,
"if we can get to rc2 without noticing a showstopper…”
200% disagree with this.
The whole point of release voting system is to identify problems no matter how
big they are. The message of finding a showstopper for me is that people paying
attention and accurately testing the relea
Michael,
your points are valid.
I would like to see the proposal from Mate.
Up to ZOOKEEPER-3188 no other patch in 3.6 (from my limited point of
view) introduced changes in quorum peer protocol to make it non
compatible with 3.5.
Enrico
Il giorno mar 11 feb 2020 alle ore 23:35 Michael K. Edwards
I think it would be prudent to emphasize in the release notes that rolling
upgrades (and mixed ensembles generally) are effectively untested. That
this was, in practice, a non-goal of this release cycle. Because if we can
get to rc2 without noticing a showstopper, clearly it's not something that
Hi Andor,
this is almost exactly what I proposed. More precisely:
1) First we make multi-address feature disabled by default.
2) If disabled, quorum protocol automatically uses the old protocol version
which lets 3.5 and 3.6 communicate smoothly. The code in 3.6.0 will be able
to understand both
Mate,
Let me reiterate to see if I understand you correctly:
1) First we make multi-address feature disabled by default.
2) If disabled, quorum protocol automatically uses the old protocol version
which lets 3.5 and 3.6 communicate smoothly.
3) Once the user finished the first rolling restart
>> but it didn't solve the problem.
Yes, the constraint is 3.6.0 has to default to old protocol version so the
outgoing message is backward compatible. If we do this, then it's
essentially the "the simplest solution" proposed.
>> disable the new MultiAddress feature and stick to the old protocol
I see the main problem here in the fact that we are missing proper
versioning in the leader election / quorum protocols. I tried to simply
implement backward compatibility in 3.6, but it didn't solve the problem.
The new code understands the old protocol, but it can not decide when to
use the new o
I hate to say it, but I think 3.6.0 should release as is. It is impossible
to *reliably* retrofit backwards compatibility / interoperability onto a
release that was engineered from the beginning without that goal. Learn
the lesson, set goals differently in the future.
On Tue, Feb 11, 2020 at 9:4
FYI: I created these scripts for my local tests:
https://github.com/symat/zk-rolling-upgrade-test
For the long term I would also add some script that actually monitors the
state of the quorum and also runs continuous traffic, not just 1-2
smoketests after each restart. But I don't know how importa
Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
ha scritto:
>
> The most obvious one which crosses my mind is that I previously worked on:
>
> 1) run old version cluster,
> 2) connect to each node and run smoke tests,
> 3) restart one node with new code,
> 4) goto 2) until all nodes are upgr
The most obvious one which crosses my mind is that I previously worked on:
1) run old version cluster,
2) connect to each node and run smoke tests,
3) restart one node with new code,
4) goto 2) until all nodes are upgraded
I think this wouldn’t work in a “unit test”, we probably need a separate
Anyone have ideas how we could add testing for upgrade? Obviously something
we're missing, esp given it's import.
Patrick
On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli
wrote:
> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> ha scritto:
> >
> > Hi All,
> >
> > about the question
Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
ha scritto:
>
> Hi All,
>
> about the question from Michael:
> > Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> > speak old message format when it's talking to old server?
>
> In this particular case, it might be
Hi All,
about the question from Michael:
> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> speak old message format when it's talking to old server?
In this particular case, it might be enough. The protocol change happened
now in the 'initial message' sent by the QuorumC
Good.
I will cancel the vote for 3.6.0rc2.
I appreciate very much If Mate and his colleagues have time to work on a
fix.
Otherwise I will have cycles next week
I would also like to spend my time in setting up a few minimal integration
tests about the upgrade story
Enrico
Il Mar 11 Feb 2020, 07
Kudos Enrico, very thorough work as the final gate keeper of the release!
Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
piece of software that put so much emphasis on compatibilities thus it just
works when upgrade
I suggest this plan:
- release 3.6.0 now
- improve the migration story, the flow outlined by Mate is
interesting, but it will take time
3.6.0rc2 got enough binding votes so I am going to finalize the
release this evening (within 8-10 hours) if no one comes out in the
VOTE thread with a -1
Enrico
On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar wrote:
> Hi,
>
> Answers inline.
>
>
> > In my experience when you are close to a release it is better to to
> > make big changes. (I am among the approvers of that patch, so I am
> > responsible for this change)
>
>
>
> Although this statement is acce
Hi,
Answers inline.
> In my experience when you are close to a release it is better to to
> make big changes. (I am among the approvers of that patch, so I am
> responsible for this change)
Although this statement is acceptable for me, I don’t feel this patch should
not have been merged into
Thank you Mate for checking and explaining this story.
I find it very interesting that the cause is ZOOKEEPER-3188 as:
- it is the last "big patch" committed to 3.6 before starting the
release process
- it is the cause of the failure of the first RC
In my experience when you are close to a releas
Actually, we have an other option: we can follow the way, how the rolling
restart support for the QuorumSSL was implemented.
- we can make 3.6.0 to be able to read both protocol versions
- we can add a parameter that tells the 3.6.0 which protocol version to use
(using the old one brakes / disables
Hi Enrico!
This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
The Protocol version was changed last time in ZOOKEEPER-2186 released
first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
also changed the protocol version when the format of the initial mess
Hi,
even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
of 3.6.0 I wanted to finish my tests and I am coming to an apparent
blocker.
I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
peers are not able to talk to each other.
I have a cluster of 3, server1, server2
25 matches
Mail list logo