Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-09-18 Thread Jason Gustafson
Hey Boyang, Thanks for the comments. Responses below: > 1. Why do we need to use type string for `StatesFilter` instead of a short value, as we could translate it and save space? I went back and forth on this. In the end I used a string for consistency with `DescribeGroups`. I doubt space is

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-09-17 Thread Boyang Chen
Thanks for the updates Jason. I'm pretty satisfied with the overall motivation and proposed solution, just a couple of more comments. 1. Why do we need to use type string for `StatesFilter` instead of a short value, as we could translate it and save space? 2. I'm wondering whether the

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-09-10 Thread Tom Bentley
Sounds good to me, thanks! On Wed, Sep 9, 2020 at 5:30 PM Jason Gustafson wrote: > Hey Tom, > > Yeah, that's fair. I will update the proposal. I was also thinking of > adding a separate column for duration, just to save users the trouble of > computing it. > > Thanks, > Jason > > On Wed, Sep 9,

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-09-09 Thread Jason Gustafson
Hey Tom, Yeah, that's fair. I will update the proposal. I was also thinking of adding a separate column for duration, just to save users the trouble of computing it. Thanks, Jason On Wed, Sep 9, 2020 at 1:21 AM Tom Bentley wrote: > Hi Jason, > > The KIP looks good to me, but I had one

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-09-09 Thread Tom Bentley
Hi Jason, The KIP looks good to me, but I had one question. AFAIU the LastTimestamp column in the output of --describe-producers and --find-hanging is there so the users of the tool know the txnLastUpdateTimestamp of the TransactionMetadata and from that and the (max) timeout can infer something

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Guozhang Wang
Thanks Jason, I do not have more comments on the KIP then. On Mon, Aug 31, 2020 at 3:19 PM Jason Gustafson wrote: > > Hmm, but the "TxnStartOffset" is not included in the DescribeProducers > response either? > > Oh, I accidentally called it `CurrentTxnStartTimestamp` in the schema. > Fixed now!

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
> Hmm, but the "TxnStartOffset" is not included in the DescribeProducers response either? Oh, I accidentally called it `CurrentTxnStartTimestamp` in the schema. Fixed now! -Jason On Mon, Aug 31, 2020 at 3:04 PM Guozhang Wang wrote: > On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson > wrote:

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Guozhang Wang
On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson wrote: > Hey Guozhang, > > Thanks for the detailed comments. Responses inline: > > > 1. I'd like to clarify how we can make "--abort" work with old brokers, > since without the additional field "Partitions" the tool needs to set the > coordinator

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
Hi Bob, Thanks for the comment. > I'm not sure how much value the MaxActiveTransactionDuration metric adds, given that we have the --find-hanging option in the tool. As you mention, instances of these transactions are expected to be rare, and a partition-level metric, which can generate a lot of

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
Hey Guozhang, Thanks for the detailed comments. Responses inline: > 1. I'd like to clarify how we can make "--abort" work with old brokers, since without the additional field "Partitions" the tool needs to set the coordinator epoch correctly instead of "-1"? Arguably that's still doable but

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-28 Thread Robert Barrett
Hi Jason, Thanks for this KIP, I think this will be a huge operational improvement and overall it looks great to me. I'm not sure how much value the MaxActiveTransactionDuration metric adds, given that we have the --find-hanging option in the tool. As you mention, instances of these transactions

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Guozhang Wang
Hi Jason, Thanks for the written KIP. I think this is going to be a very useful tool for operational improvements since with eos in its current stage, we cannot confidently assert that we are bug-free, and even in the future when we are confident this is still going to be leveraged by older

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Jason Gustafson
Hi Boyang, Thanks for the comments. Responses below: > 1. For the analysis section, is there any consistency guarantee for `ListTransactions` and `DescribeTransactions`? Let's say the coordinator receives a DescribeTransactions while the transaction is almost complete at the same time, should we

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Boyang Chen
Thanks Jason for the tooling proposal. A couple of comments: 1. For the analysis section, is there any consistency guarantee for `ListTransactions` and `DescribeTransactions`? Let's say the coordinator receives a DescribeTransactions while the transaction is almost complete at the same time,

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Lucas Bradstreet
>> Would it be worth returning transactional.id.expiration.ms in the DescribeProducersResponse? > That's an interesting thought as well. Are you trying to avoid the need to specify it through the command line? The tool could also query the value with DescribeConfigs I suppose. Basically. I'm not

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Jason Gustafson
Hey Lucas, Thanks for the comments. Responses below: > Given that it's possible for replica producer states to diverge from each other, it would be very useful if DescribeProducers(Request,Response) and tooling is able to query all partition replicas for their producers Yes, it makes sense to

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-27 Thread Lucas Bradstreet
Hi Jason, This looks like a very useful tool, thanks for writing it up. Given that it's possible for replica producer states to diverge from each other, it would be very useful if DescribeProducers(Request,Response) and tooling is able to query all partition replicas for their producers. One way

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-26 Thread Ron Dagostino
Yes, that definitely sounds reasonable. Thanks, Jason! Ron On Wed, Aug 26, 2020 at 3:03 PM Jason Gustafson wrote: > Hey Ron, > > We do not typically backport new APIs to older versions. I think we can > however make the --abort command compatible with older versions. It would > require a user

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-26 Thread Jason Gustafson
Hey Ron, We do not typically backport new APIs to older versions. I think we can however make the --abort command compatible with older versions. It would require a user to do some analysis on their own to identify a hanging transaction, but then they can use the tool from a new release to

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-26 Thread Ron Dagostino
Hi Jason. Thanks for the excellently-written KIP. Will the implementation be backported to prior Kafka versions? The reason I ask is because if it is not backported and similar functionality is not otherwise made available for older versions, then the only recourse (aside from deleting and

[DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-26 Thread Jason Gustafson
Hi All, I've added a proposal to handle the problem of hanging transactions: https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions. In theory, this should never happen. In practice, we have hit one bug where it was possible and there