Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-06-28 Thread Stanislav Kozlovski
Hi George, I think we might want to move the discussion to KIP-455 as this KIP is very tightly coupled to it. One issue I see is when controller failover occurs, the new controller > will need to read the active reassignments by scanning the > /brokers/topics//, for some clusters with hundred+

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-05-01 Thread George Li
Hi Jason, Sorry for the late response.  been busy.  I have taken a brief look at Colin's KIP-455. It's a good direction for Reassignments. I think KIP-236, KIP-435,  KIP-455 need some consensus / coordination for where the new reassignment data AND the original replicas is stored.   The

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-04-24 Thread Jason Gustafson
Hi George, I think focusing on the admin API for cancellation in this KIP is reasonable. Colin wrote up KIP-455 which adds APIs to submit a reassignment and to list reassignments in progress. Probably as long as we make these APIs all consistent with each other, it is fine to do them separately.

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-04-05 Thread George Li
Hi Jun, Thanks for the feedback!  for 40,  I agree.  It makes sense to do it via RPC request to the controller.  Maybe for KIP-236,  I will just implement RPC for reassignment cancellation only?  otherwise if KIP-236 includes other previous Reassignment related operations such as Submitting

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-04-04 Thread Jun Rao
Hi, George, Thanks for the KIP. Sorry for the late reply. A couple of comments below. 40. I agree that it's better to issue an RPC request to the controller for reassignment cancellation. If we do that, it would be useful to decide whether that call blocks on cancellation completion or not. 41.

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-26 Thread George Li
Hi Ismael, Thanks,  I understand your points. I will add the RPC mechanism for reassignments to KIP-236.  I can think of a few Requests/Responses corresponding to the old Scala AdminClient using ZK:   SubmitReassignments (--execute) StatusReassignments (--verify)

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-25 Thread Ismael Juma
Hi George, The goal is not to prevent people from updating ZK directly. The goal is to offer a solution where people don't have to. If people then decide to avoid the recommended path, they can deal with the consequences. However, if we add another structure in ZK and no RPC mechanism, then there

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-25 Thread George Li
Thanks Ismael.  One question, even switch to submitting reassignments via RPC instead of Zookeeper.  The reassignment data will still persist in ZooKeeper node /admin/reassign_partitions (e.g. when Controller failover it can resume reassignments)?  If yes, how this can keep someone from

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-23 Thread Ismael Juma
Thanks for the KIP, making reassignment more flexible is definitely welcome. As others have mentioned, I think we need to do it via the Kafka protocol and not via ZK. The latter introduces an implicit API that other tools will depend on causing migration challenges. This has already happened with

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-23 Thread Colin McCabe
On Thu, Mar 21, 2019, at 20:51, George Li wrote: > Hi Colin, > > I agree with your proposal of having administrative APIs through RPC > instead of ZooKeeper. But seems like it will incur significant changes > to both submitting reassignments and this KIP's cancelling pending > reassignments. 

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-21 Thread George Li
Hi Colin, I agree with your proposal of having administrative APIs through RPC instead of ZooKeeper. But seems like it will incur significant changes to both submitting reassignments and this KIP's cancelling pending reassignments.  To make this KIP simple and moving along, I will be happy to

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-20 Thread Colin McCabe
Hi George, One big problem here is that administrative APIs should be done through RPCs, not through ZooKeeper. KIP-4 (Command line and centralized administrative operations) describes the rationale for this. We want public and stable APIs that don't depend on the internal representation of

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-19 Thread George Li
Hi Viktor, Thanks for the review.  If there is reassignment in-progress while the cluster is upgraded with this KIP (upgrade the binary and then do a cluster rolling restart of the brokers), the reassignment JSON in Zookeeper  /admin/reassign_partitions will only have  {topic, partition,

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-19 Thread Viktor Somogyi-Vass
Hey George, Thanks for the answers. I'll try to block out time this week to review your PR. I have one more point to clarify: I've seen some customers who are managing Kafka as an internal company-wide service and they may or may not know that how certain topics are used within the company. That

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-18 Thread George Li
Hi Viktor, FYI, I have added a new ducktape test:   tests/kafkatest/tests/core/reassign_cancel_test.py to  https://github.com/apache/kafka/pull/6296   After review, do you have any more questions?  Thanks Hi Jun,  Could you help review this when you have time?  Thanks Can we start a vote on

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-05 Thread George Li
Hi Viktor, >  2.: One follow-up question: if the reassignment cancellation gets >interrupted and a failover happens after step #2 but before step #3, how will >the new controller continue? At this stage Zookeeper would contain OAR + RAR, >however the brokers will have the updated

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-05 Thread Viktor Somogyi-Vass
Hey George, Sorry for the delay. I'll answer point-by-point: 1.2: I think it's fine. As you say we presume that the client knows the state of the cluster before doing the reassignment, so we can presume the same for cancellation. 2.: One follow-up question: if the reassignment cancellation gets

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-03-01 Thread George Li
Hi Jun, Could you help review KIP-236 when you have time?  Thanks. Hi Becket,  Since you filed https://issues.apache.org/jira/browse/KAFKA-6304 to request this feature.  Could you also help review and comment on KIP-236 ?  Thanks.  Hi Viktor, I have updated 

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-25 Thread George Li
Hi Viktor,  Thanks for the response.  Good questions!  answers below:  > A few questions regarding the rollback algorithm:> 1. At step 2 how do you > elect the leader?  The step 2 code is in https://github.com/apache/kafka/pull/6296   core/src/main/scala/kafka/controller/KafkaController.scala 

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-25 Thread Viktor Somogyi-Vass
Hey George, Thanks for the prompt response, it makes sense. I'll try to keep your code changes on top of my list and help reviewing that. :) Regarding the incremental reassignment: I don't mind either to discuss it as part of this or in a separate conversation but I think a separate one could be

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread George Li
Hi Viktor,  Thanks for reading and provide feedbacks on KIP-236.  For reassignments, one can generate a json for new assignments and another json with "original" assignments for rollback purpose.  In production cluster, from our experience, we need to submit the reassignments in batches with

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
Read through the KIP and I have one comment: It seems like it is not looking strictly for cancellation but also implements rolling back to the original. I think it'd be much simpler to generate a reassignment json on cancellation that contains the original assignment and start a new partition

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
I've published the above mentioned KIP here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment Will start a discussion about it soon. On Fri, Feb 22, 2019 at 12:45 PM Viktor Somogyi-Vass < viktorsomo...@gmail.com> wrote: > Hi Folks, > > I also have a

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
Hi Folks, I also have a pending active work on the incremental partition reassignment stuff here: https://issues.apache.org/jira/browse/KAFKA-6794 I think it would be good to cooperate on this to make both work compatible with each other. I'll write up a KIP about this today so it'll be easier

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-21 Thread Harsha
Thanks George. LGTM. Jun & Tom, Can you please take a look at the updated KIP. Thanks, Harsha On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote: > Hi, > > After discussing with Tom, Harsha and I are picking up KIP-236 >

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-20 Thread George Li
Hi, After discussing with Tom, Harsha and I are picking up KIP-236.  The work focused on safely/cleanly cancel / rollback pending reassignments in a timely fashion.  Pull Request #6296  Still working on more integration/system tests.  Please review and provide feedbacks/suggestions. 

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-22 Thread Jun Rao
Hi, Tom, Thanks for the reply. 10. That's a good thought. Perhaps it's better to get rid of /admin/reassignment_requests too. The window when a controller is not available is small. So, we can just failed the admin client if the controller is not reachable after the timeout. 13. With the

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-22 Thread Tom Bentley
Hi Jun, Thanks for responding, my replies are inline: 10. You explanation makes sense. My remaining concern is the additional ZK > writes in the proposal. With the proposal, we will need to do following > writes in ZK. > > a. write new assignment in /admin/reassignment_requests > > b. write new

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-20 Thread Jun Rao
Hi, Tom, Thanks for the reply. A few more comments below. 10. You explanation makes sense. My remaining concern is the additional ZK writes in the proposal. With the proposal, we will need to do following writes in ZK. a. write new assignment in /admin/reassignment_requests b. write new

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-19 Thread Tom Bentley
Hi Jun, 10. Another concern of mine is on consistency with the current pattern. The > current pattern for change notification based on ZK is (1) we first write > the actual value in the entity path and then write the change notification > path, and (2) the change notification path only includes

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-18 Thread Jun Rao
Hi, Tom, Thanks for the reply. A few more followup comments below. 10. Another concern of mine is on consistency with the current pattern. The current pattern for change notification based on ZK is (1) we first write the actual value in the entity path and then write the change notification

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-18 Thread Tom Bentley
Hi Jun, Thanks for replying, some answers below: > 10. The proposal now stores the reassignment for all partitions in > /admin/reassignment_requests/request_xxx. If the number of reassigned > partitions is larger, the ZK write may hit the default 1MB limit and fail. > An alternative approach is

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-15 Thread Jun Rao
Hi, Tom, Thanks for the updated KIP. A few more comments below. 10. The proposal now stores the reassignment for all partitions in /admin/reassignment_requests/request_xxx. If the number of reassigned partitions is larger, the ZK write may hit the default 1MB limit and fail. An alternative

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-15 Thread Tom Bentley
Just wanted to mention that I've started KIP-240, which builds on top of this one to provide an AdminClient API for listing and describing reassignments. On 15 December 2017 at 14:34, Tom Bentley wrote: > > Should we seek to improve this algorithm in this KIP, or leave

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-15 Thread Tom Bentley
> Should we seek to improve this algorithm in this KIP, or leave that as a later optimisation? I've updated the KIP with a proposed algorithm. On 14 December 2017 at 09:57, Tom Bentley wrote: > Thanks Ted, now fixed. > > On 13 December 2017 at 18:38, Ted Yu

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-14 Thread Tom Bentley
Thanks Ted, now fixed. On 13 December 2017 at 18:38, Ted Yu wrote: > Tom: > bq. create a znode /admin/reassignments/$topic-$partition > > Looks like the tree structure above should be: > > /admin/reassignments/$topic/$partition > > bq. The controller removes

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-13 Thread Ted Yu
Tom: bq. create a znode /admin/reassignments/$topic-$partition Looks like the tree structure above should be: /admin/reassignments/$topic/$partition bq. The controller removes /admin/reassignment/$topic/$partition Note the lack of 's' for reassignment. It would be good to make zookeeper paths

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-13 Thread Tom Bentley
Hi Jun and Ted, Jun, you're right that needing one watcher per reassigned partition presents a scalability problem, and using a separate notification path solves that. I also agree that it makes sense to prevent users from using both methods on the same reassignment. Ted, naming the

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-11 Thread Jun Rao
Another question is on the compatibility. Since now there are 2 ways of specifying a partition reassignment, one under /admin/reassign_partitions and the other under /admin/reassignments, we probably want to prevent the same topic being reassigned under both paths at the same time? Thanks, Jun

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-09 Thread Ted Yu
For the znodes created under /admin/reassignments/, would it be better if the partition znodes for the same topic are put under subtree headed by topic name ? In Tom's example, instead of: /admin/reassignments/mytopic-42 the znode would be: /admin/reassignments/mytopic/42 Cheers On Fri, Dec

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-08 Thread Jun Rao
Hi, Tom, Thanks for the KIP. It definitely addresses one of the pain points in partition reassignment. Another issue that it also addresses is the ZK node size limit when writing the reassignment JSON. My only concern is that the KIP needs to create one watcher per reassigned partition. This

[DISCUSS] KIP-236 Interruptible Partition Reassignment

2017-12-06 Thread Tom Bentley
Hi, This is still very new, but I wanted some quick feedback on a preliminary KIP which could, I think, help with providing an AdminClient API for partition reassignment. https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment I wasn't sure whether to