Thanks Ismael,
I added an extra paragraph in the motivation. We have certainly hit this
within our internal Confluent reassignment software and from a quick skim
in the popular Cruise Control repository, I notice that similar problems
have been hit there too. Hopefully the examples in the KIP are
Thanks for the details, Colin. I understand how this can happen. But this
API has been out for a long time. Are we saying that we have seen Cruise
Control cause this kind of problem? If so, it would be good to mention it
in the KIP as evidence that the current approach is brittle.
Ismael
On Wed,
Hi Ismael,
I think this issue comes up when people write software that automatically
creates partition reassignments to balance the cluster. Cruise Control is one
example; Confluent also has some software that does this. If there is already a
reassignment that is going on for some partition
Thanks for the KIP. Can we explain a bit more why this is an important use
case to address? For example, do we have concrete examples of people
running into this? The way the KIP is written, it sounds like a potential
problem but no information is given on whether it's a real problem in
practice.
Thanks David,
I do prefer the `disallow-replication-factor-change` flag but only for the
CLI. I assume that's what you're proposing instead of
"disable-replication-factor-change". The wording is more natural in your
suggestion I feel.
If we were to modify more (e.g RPC, Admin API), I think it'd
Thanks Stan. Overall, the KIP looks good to me. I have two minor comments:
* Could we document the different versions in the
AlterPartitionReassignmentsRequest/Request schema? You can look at
other requests/responses to see how we have done this so far.
* I wonder if
Thanks for the discussion all,
I have updated the KIP to mention throwing an UnsupportedVersionException
if the server is running an old version that would not honor the configured
allowReplicationFactor option.
Please take a look:
- KIP:
Throwing an UnsupportedVersionException with an appropriate message
seems to be the best option when the new API is not supported and
AllowReplicationFactorChange is not set to the default value.
Cheers,
David
On Mon, Aug 8, 2022 at 6:25 PM Vikas Singh wrote:
>
> I personally like the UVE
I personally like the UVE option. It provides options for clients to go
either way, retry or abort. If we do it in AdminClient, then users have to
live with what we have chosen.
> Note this can happen during an RF change too. e.g [1,2,3] => [4,5,6,7] (RF
> change, intermediate set is
Thank you for the reviews.
Vikas,
> > In the case of an already-reassigning partition being reassigned again,
the validation compares the targetReplicaSet size of the reassignment to
the targetReplicaSet size of the new reassignment and throws if those
differ.
> Can you add more detail to this,
Hi Stanislav,
Thanks for the KIP. I think this is a nice solution to the problem of not
wanting to change the replication factor during reassignments.
Just from a writing point of view, it would be nice for the first paragraph to
be a bit more explicit about this goal. Maybe lead with "Many
Hi Stanislav,
Thanks for offering a solution to reduce race conditions in Kafka.
I went through your Wiki and your proposal sounds reasonable to me. I like it.
Best,
Daniel
On 2022/07/28 08:59:18 Stanislav Kozlovski wrote:
> Hey all,
>
> I'd like to start a discussion on a proposal to help
Thanks Stanislav for the KIP. Seems like a reasonable proposal,
preventing users from accidentally altering the replica set under certain
conditions. I have couple of comments:
> In the case of an already-reassigning partition being reassigned again,
the validation compares the targetReplicaSet
Hey all,
I'd like to start a discussion on a proposal to help API users from
inadvertently increasing the replication factor of a topic through
the alter partition reassignments API. The KIP describes two fairly
easy-to-hit race conditions in which this can happen.
The KIP itself is pretty
14 matches
Mail list logo