Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-02 Thread Matthias J. Sax
I think calling it endOffset is still fine. We should keep it "simple" for users and not introduce too many concepts. -Matthias On 3/2/21 8:14 AM, Walker Carlson wrote: > Okay we can document that if the state is rebalancing that a Task could be > between instances and so no show up for one

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-02 Thread Walker Carlson
Okay we can document that if the state is rebalancing that a Task could be between instances and so no show up for one localThreadMetadata call. but this should not cause a problem for repeated calls Bruno, to your questions. The endOffset is like the consumer's highWatermark and does not require

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-02 Thread Bruno Cadonna
Hi Walker, Thank you for the KIP! I somehow agree that we should document that some tasks may be missing. I have one question/comment. As far as I understand, your KIP adds two methods that return data that is actually hosted on the brokers, namely committedOffsets() and endOffsets(). Thus,

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-01 Thread Matthias J. Sax
> but the user should > not rely on all tasks being returned at any given time to begin with since > it's possible we are in between revoking and re-assigning a partition. Exactly. That is what I meant: the "hand off" phase of partitions during a rebalance. During this phase, some tasks are

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-01 Thread Walker Carlson
I updated to use Optional, good idea Mathias. For the localThreadMetadata, it could already be called running a rebalance. Also I mention that they return the highest value they had seen so far for any tasks they have assigned to them. I thought it would be useful to see the TaskMetadata while

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-01 Thread Sophie Blee-Goldman
Can you clarify your second question Matthias? If this is queried during a cooperative rebalance, it should return the tasks as usual. If the user is using eager rebalancing then this will not return any tasks, but the user should not rely on all tasks being returned at any given time to begin

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-01 Thread Matthias J. Sax
Thanks the updating the KIP Walker. About, `timeCurrentIdlingStarted()`: should we return an `Optional` instead of `-1` if a task is not idling. As we allow to call `localThreadMetadata()` any time, could it be that we report partial information during a rebalance? If yes, this should be

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-27 Thread Walker Carlson
Sure thing Boyang, 1) it is in proposed changes. I expanded on it a bit more now. 2) done 3) and done :) thanks for the suggestions, walker On Fri, Feb 26, 2021 at 3:10 PM Boyang Chen wrote: > Thanks Walker. Some minor comments: > > 1. Could you add a reference to localThreadMetadata method

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-26 Thread Boyang Chen
Thanks Walker. Some minor comments: 1. Could you add a reference to localThreadMetadata method in the KIP? 2. Could you make the code block as a java template, such that TaskMetadata.java could be as the template title? Also it would be good to add some meta comments about the newly added

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-26 Thread Walker Carlson
I understand now. I think that is a valid concern but I think it is best solved but having an external service verify through streams. As this KIP is now just adding fields to TaskMetadata to be returned in the threadMetadata I am going to say that is out of scope. That seems to be the last

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-25 Thread Boyang Chen
For the 3rd point, yes, what I'm proposing is an edge case. For example, when we have 4 tasks [0_0, 0_1, 1_0, 1_1], and a bug in rebalancing logic causing no one gets 1_1 assigned. Then the health check service will only see 3 tasks [0_0, 0_1, 1_0] reporting progress normally while not paying

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-25 Thread Walker Carlson
Thanks for the follow up Boyang and Guozhang, I have updated the kip to include these ideas. Guozhang, that is a good idea about using the TaskMetadata. We can get it through the ThreadMetadata with a minor change to `localThreadMetadata` in kafkaStreams. This means that we will only need to

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-25 Thread Guozhang Wang
Regarding the second API and the `TaskStatus` class: I'd suggest we consolidate on the existing `TaskMetadata` since we have already accumulated a bunch of such classes, and its better to keep them small as public APIs. You can see https://issues.apache.org/jira/browse/KAFKA-12370 for a reference

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-25 Thread Boyang Chen
Thanks for the updates Walker. Some replies and follow-up questions: 1. I agree one task could have multiple partitions, but when we hit a delay in terms of offset progress, do we have a convenient way to reverse mapping TopicPartition to the problematic task? In production, I believe it would be

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-24 Thread Walker Carlson
Thank you for the comments everyone! I think there are a few things I can clear up in general then I will specifically respond to each question. First, when I say "idling" I refer to task idling. Where the stream is intentionally not making progress. (

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-22 Thread Guozhang Wang
Hello Walker, thanks for the KIP. A few thoughts: 1) Have you considered just relying on the `KafkaStreams#metrics()` that includes embedded consumer metrics that have the committed offsets instead of adding a new API? Not advocating that this is a better approach but want to make sure we

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-22 Thread Matthias J. Sax
Thanks for the KIP! I personally think, that it might be sufficient to just report offsets of assigned tasks. Similar to metrics what are also reported only locally, users can roll-up/aggregate the information across instances manually. What I also don't understand is, what "idling" means?

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-22 Thread Boyang Chen
Thanks Walker for the proposed KIP! This should definitely empower KStream users with better visibility. Meanwhile I got a couple of questions/suggestions: 1. typo "repost/report" in the motivation section. 2. What offsets do we report when the task is under restoration or rebalancing? 3.

[DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-22 Thread Walker Carlson
Hello all, I would like to start discussion on KIP-715. This kip aims to make it easier to monitor Kafka Streams progress by exposing the committed offset in a similar way as the consumer client does. Here is the KIP: https://cwiki.apache.org/confluence/x/aRRRCg Best, Walker