You are correct Raman. I updated the KIP to reflect your observations
and corrections.
Raman wrote:
> I think the line #2 here is wrong with regard to AlterPartitionRequest.
Yes. Bullet 2. Now reads: 2. The LeaderRecoveryState is changing from
RECOVERED to RECOVERING.
Raman wrote:
> Also, in
Also, in the response schema, I think the last line is a bit
misleading. As we discussed offline, the field can be either
Recovering or Recovered, depending on when the leader sent out an
AlterPartition request to the controller (i.e. during recovery or
after recovery has completed). But the
Jose,
I think the line #2 here is wrong with regard to AlterPartitionRequest.
```
Request Handling
The controller will validate the LeaderRecoveryState field and return
an INVALID_REQUEST error when:
1. The size of the ISR is greater than 1 and the LeaderRecoveryState
is RECOVERING.
2. The
David Jacot wrote:
> The behavior of the leader is clear. However, the part which is
> not clear to me is how followers which could get fetch requests
> from consumers as well will handle them. Sorry if I was not clear.
Got it. I updated the KIP to add more information regarding how the
topic
Hi José,
Thanks for your reply.
The behavior of the leader is clear. However, the part which is
not clear to me is how followers which could get fetch requests
from consumers as well will handle them. Sorry if I was not clear.
Best,
David
On Thu, Feb 3, 2022 at 5:47 AM José Armando García
Thanks for the feedback David Jacot
David Jacot wrote:
> I have one question regarding how fetch from followers will
> work when the leader is recovering. My understanding is that
> the leader will reject any produce and fetch requests with a
> NOT_LEADER_OR_FOLLOWER error while the followers
>
Hi José,
Thanks for the KIP.
I have one question regarding how fetch from followers will
work when the leader is recovering. My understanding is that
the leader will reject any produce and fetch requests with a
NOT_LEADER_OR_FOLLOWER error while the followers
will fence any fetch requests based
Hi all,
Jason and I discussed this offline. At a high-level I have made the
following changes to the KIP.
1. IBP will be used to enable this feature and to determine which
version of LeaderAndIsr and AlterPartition will be used.
2. The LeaderRecoveryState field for LeaderAndIsr and
Hi Jason,
Jason wrote:
> Thanks for the updates. I noticed that `LeaderRecoveryState` is marked as
> ignorable in the `AlterPartition` request. It would be helpful to
> understand the motivation for that.
I think it is fine for this property to be marked as ignorable because
the property is
Hey Jose,
Thanks for the updates. I noticed that `LeaderRecoveryState` is marked as
ignorable in the `AlterPartition` request. It would be helpful to
understand the motivation for that.
Thanks,
Jason
On Wed, Jan 26, 2022 at 10:25 AM Colin McCabe wrote:
> On Wed, Jan 26, 2022, at 09:14, José
On Wed, Jan 26, 2022, at 09:14, José Armando García Sancio wrote:
> Thanks for the feedback Colin.
>
> Colin wrote:
>> We already have many classes that are called "partition state." For example,
>> PartitionStates.java on the client side, PartitionStateMachine.scala and
>>
Thanks for the feedback Colin.
Colin wrote:
> We already have many classes that are called "partition state." For example,
> PartitionStates.java on the client side, PartitionStateMachine.scala and
> TopicPartitionStateZNode in the old controller,
> RemotePartitionDeleteState.java under
On Fri, Jan 21, 2022, at 11:07, José Armando García Sancio wrote:
> Hi all,
>
> The following suggestions are not strictly required to implement this
> KIP but what do we think about:
>
> 1. Changing the name of the AlterIsr RPC to AlterPartition RPC.
>
> 2. Change the name of the field
Thanks for the additional context regarding AlterIsrResponse.
Jason wrote:
> In regard to the naming of `IsLeaderRecovering`, I agree it still seems a
> bit awkward. I kind of liked the idea of turning it into a `PartitionState`
> field instead. That would also address the inconsistent type in
For a little background on why `AlterIsr` returns the state back in the
response, originally the idea was that the partition leader could use the
response to reset its own state after a failed request. The tricky thing
for ISR changes is always ensuring that the partition leader reflects a
Hi all,
The following suggestions are not strictly required to implement this
KIP but what do we think about:
1. Changing the name of the AlterIsr RPC to AlterPartition RPC.
2. Change the name of the field "CurrentIsrVersion" to
"PartitionEpoch". This is the name that we use in the KRaft
Thanks Raman and Colin for your feedback.
Raman wrote:
> - Could you please explain the following about backward compatibility.
> If a leader has been elected unclean. And we decide to roll the
> cluster back when the leader is in the middle of recovery, leader will
> simply not be able to
Hi José,
Thanks for the changes.
"isLeaderRecovering" sounds pretty awkward. If we want to call this "leader
recovery" then maybe the flag could be something like "inLeaderRecovery."
Actually, how about "inElectionRecovery" to emphasize the fact that we are
recovering from an unclean leader
Jose,
Is there a specific need for adding "isUnclean '' flag to
`AlterIsrResponse`. A potential sequence of events will be:
1. Controller sets the flag at `ZK` and informs the leader via the
`LeaderAndIsrRequest`
2. Leader undertakes the necessary recovery and sends out a
`AlterIsrRequest` to
Thanks for this KIP, Jose
- Could you please explain the following about backward compatibility.
If a leader has been elected unclean. And we decide to roll the
cluster back when the leader is in the middle of recovery, leader will
simply not be able to recover when we roll back because it will
Hi all,
I made the following changes to the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=173082256=12=11
Some of the highlights are:
1. Changed the field from IsUnclean to IsLeaderRecovering
2. Added a few more sentences explaining why this KIP is backward
Thanks for the feedback Colin and Luke.
Colin wrote:
> The KIP talks a bit about "recovery," which is a new term (as far as I
> know). If I understand correctly, this is a state that the partition enters
> into after an unclean leader election. I would suggest using a different
> term for this,
Hi José,
Thanks for the KIP.
It's indeed a problem that we should handle it well!
Just a minor comment:
In the `PartitionChangeRecord` schema, you set the `isUnclean` field as
type boolean,
but the value (-1, 0, 1) are all int values. Did you mean int32 type or the
value should be updated?
Hi José,
Thanks for the KIP.
The KIP talks a bit about "recovery," which is a new term (as far as I know).
If I understand correctly, this is a state that the partition enters into after
an unclean leader election. I would suggest using a different term for this,
since we already use the term
Jose wrote:
> I'll update the KIP with this information. The leader will return
> "NOT_LEADER_OR_FOLLOWER" for any partition that is still recovering
> for Fetch, Produce, OffsetsForLeaderEpoch and DeleteRecords requests.
> This error type is retriable by the clients.
I forgot to include
Thanks Jason and David for your feedback. See my comments below.
David wrote:
> 1) Does recovering from an unclean state bump the leader epoch?
Looking at the controller code, the leader epoch is only increased if
the leader id changes.
David wrote:
> 2) The name of "NewIsUnclean" field in
Jose, thanks for the KIP!
1) Does recovering from an unclean state bump the leader epoch?
2) The name of "NewIsUnclean" field in AlterIsrRequest is a little strange.
>From the description, it sounds like this will be used to by the broker to
indicate to the controller that it has recovered from
Hi Jose,
Thanks for the KIP. Just a minor question about this:
> This means that the leader will not allow followers to join the ISR until
it has recovered from the unclean leader election.
If I understand correctly, the main reason for this is to avoid the need to
propagate the "IsUnclean"
Hi all,
I would like to open the discussion on implementing "KIP-704: Send a
hint to broker if it is an unclean leader." See this wiki page for
details: https://cwiki.apache.org/confluence/x/kAZRCg
Thanks!
--
-Jose
29 matches
Mail list logo