Hi George,
I think we might want to move the discussion to KIP-455 as this KIP is very
tightly coupled to it.
One issue I see is when controller failover occurs, the new controller
> will need to read the active reassignments by scanning the
> /brokers/topics//, for some clusters with hundred+
Hi Jason,
Sorry for the late response. been busy. I have taken a brief look at Colin's
KIP-455. It's a good direction for Reassignments. I think KIP-236, KIP-435,
KIP-455 need some consensus / coordination for where the new reassignment data
AND the original replicas is stored.
The
Hi George,
I think focusing on the admin API for cancellation in this KIP is
reasonable. Colin wrote up KIP-455 which adds APIs to submit a reassignment
and to list reassignments in progress. Probably as long as we make these
APIs all consistent with each other, it is fine to do them separately.
Hi Jun,
Thanks for the feedback!
for 40, I agree. It makes sense to do it via RPC request to the controller.
Maybe for KIP-236, I will just implement RPC for reassignment cancellation
only? otherwise if KIP-236 includes other previous Reassignment related
operations such as Submitting
Hi, George,
Thanks for the KIP. Sorry for the late reply. A couple of comments below.
40. I agree that it's better to issue an RPC request to the controller for
reassignment cancellation. If we do that, it would be useful to decide
whether that call blocks on cancellation completion or not.
41.
Hi Ismael,
Thanks, I understand your points. I will add the RPC mechanism for
reassignments to KIP-236. I can think of a few Requests/Responses
corresponding to the old Scala AdminClient using ZK:
SubmitReassignments (--execute)
StatusReassignments (--verify)
Hi George,
The goal is not to prevent people from updating ZK directly. The goal is to
offer a solution where people don't have to. If people then decide to avoid
the recommended path, they can deal with the consequences. However, if we
add another structure in ZK and no RPC mechanism, then there
Thanks Ismael. One question, even switch to submitting reassignments via RPC
instead of Zookeeper. The reassignment data will still persist in ZooKeeper
node /admin/reassign_partitions (e.g. when Controller failover it can resume
reassignments)? If yes, how this can keep someone from
Thanks for the KIP, making reassignment more flexible is definitely
welcome. As others have mentioned, I think we need to do it via the Kafka
protocol and not via ZK. The latter introduces an implicit API that other
tools will depend on causing migration challenges. This has already
happened with
On Thu, Mar 21, 2019, at 20:51, George Li wrote:
> Hi Colin,
>
> I agree with your proposal of having administrative APIs through RPC
> instead of ZooKeeper. But seems like it will incur significant changes
> to both submitting reassignments and this KIP's cancelling pending
> reassignments.
Hi Colin,
I agree with your proposal of having administrative APIs through RPC instead of
ZooKeeper. But seems like it will incur significant changes to both submitting
reassignments and this KIP's cancelling pending reassignments.
To make this KIP simple and moving along, I will be happy to
Hi George,
One big problem here is that administrative APIs should be done through RPCs,
not through ZooKeeper. KIP-4 (Command line and centralized administrative
operations) describes the rationale for this. We want public and stable APIs
that don't depend on the internal representation of
Hi Viktor,
Thanks for the review.
If there is reassignment in-progress while the cluster is upgraded with this
KIP (upgrade the binary and then do a cluster rolling restart of the brokers),
the reassignment JSON in Zookeeper /admin/reassign_partitions will only have
{topic, partition,
Hey George,
Thanks for the answers. I'll try to block out time this week to review your
PR.
I have one more point to clarify:
I've seen some customers who are managing Kafka as an internal company-wide
service and they may or may not know that how certain topics are used
within the company. That
Hi Viktor,
FYI, I have added a new ducktape test:
tests/kafkatest/tests/core/reassign_cancel_test.py to
https://github.com/apache/kafka/pull/6296
After review, do you have any more questions? Thanks
Hi Jun,
Could you help review this when you have time? Thanks
Can we start a vote on
Hi Viktor,
> 2.: One follow-up question: if the reassignment cancellation gets
>interrupted and a failover happens after step #2 but before step #3, how will
>the new controller continue? At this stage Zookeeper would contain OAR + RAR,
>however the brokers will have the updated
Hey George,
Sorry for the delay.
I'll answer point-by-point:
1.2: I think it's fine. As you say we presume that the client knows the
state of the cluster before doing the reassignment, so we can presume the
same for cancellation.
2.: One follow-up question: if the reassignment cancellation gets
Hi Jun,
Could you help review KIP-236 when you have time? Thanks.
Hi Becket,
Since you filed https://issues.apache.org/jira/browse/KAFKA-6304 to request
this feature. Could you also help review and comment on KIP-236 ? Thanks.
Hi Viktor,
I have updated
Hi Viktor,
Thanks for the response. Good questions! answers below:
> A few questions regarding the rollback algorithm:> 1. At step 2 how do you
> elect the leader?
The step 2 code is in https://github.com/apache/kafka/pull/6296
core/src/main/scala/kafka/controller/KafkaController.scala
Hey George,
Thanks for the prompt response, it makes sense. I'll try to keep your code
changes on top of my list and help reviewing that. :)
Regarding the incremental reassignment: I don't mind either to discuss it
as part of this or in a separate conversation but I think a separate one
could be
Hi Viktor,
Thanks for reading and provide feedbacks on KIP-236.
For reassignments, one can generate a json for new assignments and another json
with "original" assignments for rollback purpose. In production cluster, from
our experience, we need to submit the reassignments in batches with
Read through the KIP and I have one comment:
It seems like it is not looking strictly for cancellation but also
implements rolling back to the original. I think it'd be much simpler to
generate a reassignment json on cancellation that contains the original
assignment and start a new partition
I've published the above mentioned KIP here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment
Will start a discussion about it soon.
On Fri, Feb 22, 2019 at 12:45 PM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:
> Hi Folks,
>
> I also have a
Hi Folks,
I also have a pending active work on the incremental partition reassignment
stuff here: https://issues.apache.org/jira/browse/KAFKA-6794
I think it would be good to cooperate on this to make both work compatible
with each other.
I'll write up a KIP about this today so it'll be easier
Thanks George. LGTM.
Jun & Tom, Can you please take a look at the updated KIP.
Thanks,
Harsha
On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote:
> Hi,
>
> After discussing with Tom, Harsha and I are picking up KIP-236
>
Hi,
After discussing with Tom, Harsha and I are picking up KIP-236. The work
focused on safely/cleanly cancel / rollback pending reassignments in a timely
fashion. Pull Request #6296 Still working on more integration/system tests.
Please review and provide feedbacks/suggestions.
Hi, Tom,
Thanks for the reply.
10. That's a good thought. Perhaps it's better to get rid of
/admin/reassignment_requests
too. The window when a controller is not available is small. So, we can
just failed the admin client if the controller is not reachable after the
timeout.
13. With the
Hi Jun,
Thanks for responding, my replies are inline:
10. You explanation makes sense. My remaining concern is the additional ZK
> writes in the proposal. With the proposal, we will need to do following
> writes in ZK.
>
> a. write new assignment in /admin/reassignment_requests
>
> b. write new
Hi, Tom,
Thanks for the reply. A few more comments below.
10. You explanation makes sense. My remaining concern is the additional ZK
writes in the proposal. With the proposal, we will need to do following
writes in ZK.
a. write new assignment in /admin/reassignment_requests
b. write new
Hi Jun,
10. Another concern of mine is on consistency with the current pattern. The
> current pattern for change notification based on ZK is (1) we first write
> the actual value in the entity path and then write the change notification
> path, and (2) the change notification path only includes
Hi, Tom,
Thanks for the reply. A few more followup comments below.
10. Another concern of mine is on consistency with the current pattern. The
current pattern for change notification based on ZK is (1) we first write
the actual value in the entity path and then write the change notification
Hi Jun,
Thanks for replying, some answers below:
> 10. The proposal now stores the reassignment for all partitions in
> /admin/reassignment_requests/request_xxx. If the number of reassigned
> partitions is larger, the ZK write may hit the default 1MB limit and fail.
> An alternative approach is
Hi, Tom,
Thanks for the updated KIP. A few more comments below.
10. The proposal now stores the reassignment for all partitions in
/admin/reassignment_requests/request_xxx. If the number of reassigned
partitions is larger, the ZK write may hit the default 1MB limit and fail.
An alternative
Just wanted to mention that I've started KIP-240, which builds on top of
this one to provide an AdminClient API for listing and describing
reassignments.
On 15 December 2017 at 14:34, Tom Bentley wrote:
> > Should we seek to improve this algorithm in this KIP, or leave
> Should we seek to improve this algorithm in this KIP, or leave that as a
later optimisation?
I've updated the KIP with a proposed algorithm.
On 14 December 2017 at 09:57, Tom Bentley wrote:
> Thanks Ted, now fixed.
>
> On 13 December 2017 at 18:38, Ted Yu
Thanks Ted, now fixed.
On 13 December 2017 at 18:38, Ted Yu wrote:
> Tom:
> bq. create a znode /admin/reassignments/$topic-$partition
>
> Looks like the tree structure above should be:
>
> /admin/reassignments/$topic/$partition
>
> bq. The controller removes
Tom:
bq. create a znode /admin/reassignments/$topic-$partition
Looks like the tree structure above should be:
/admin/reassignments/$topic/$partition
bq. The controller removes /admin/reassignment/$topic/$partition
Note the lack of 's' for reassignment. It would be good to make zookeeper
paths
Hi Jun and Ted,
Jun, you're right that needing one watcher per reassigned partition
presents a scalability problem, and using a separate notification path
solves that. I also agree that it makes sense to prevent users from using
both methods on the same reassignment.
Ted, naming the
Another question is on the compatibility. Since now there are 2 ways of
specifying a partition reassignment, one under /admin/reassign_partitions
and the other under /admin/reassignments, we probably want to prevent the
same topic being reassigned under both paths at the same time?
Thanks,
Jun
For the znodes created under /admin/reassignments/, would it be better if
the partition znodes for the same topic are put under subtree headed by
topic name ?
In Tom's example, instead of:
/admin/reassignments/mytopic-42
the znode would be:
/admin/reassignments/mytopic/42
Cheers
On Fri, Dec
Hi, Tom,
Thanks for the KIP. It definitely addresses one of the pain points in
partition reassignment. Another issue that it also addresses is the ZK node
size limit when writing the reassignment JSON.
My only concern is that the KIP needs to create one watcher per reassigned
partition. This
Hi,
This is still very new, but I wanted some quick feedback on a preliminary
KIP which could, I think, help with providing an AdminClient API for
partition reassignment.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment
I wasn't sure whether to
42 matches
Mail list logo