Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-15 Thread Matthias J. Sax
Thanks for the update. I did a first pass over the updated KIP and think it makes sense. -Matthias On 7/11/18 5:47 PM, John Roesler wrote: > Hi all, > > I have updated KIP-328 with all the feedback I've gotten so far. Please > take another look and let me know what you think! > > Thanks, >

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-11 Thread John Roesler
Hi all, I have updated KIP-328 with all the feedback I've gotten so far. Please take another look and let me know what you think! Thanks, -John On Wed, Jul 11, 2018 at 12:28 AM Guozhang Wang wrote: > That is a good point.. > > I cannot think of a better option than documentation and warning,

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread Guozhang Wang
That is a good point.. I cannot think of a better option than documentation and warning, and also given that we'd probably better not reusing the function name `until` for close time. Guozhang On Tue, Jul 10, 2018 at 3:31 PM, John Roesler wrote: > I had some opportunity to reflect on the

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread John Roesler
I had some opportunity to reflect on the default for close time today... Note that the current "close time" is equal to the retention time, and therefore "close" today shares the default retention of 24h. It would definitely break any application that today specifies a retention time to set

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread John Roesler
Hi Guozhang, That sounds good to me. I'll include that in the KIP. Thanks, -John On Mon, Jul 9, 2018 at 6:33 PM Guozhang Wang wrote: > Let me clarify a bit on what I meant about moving `retentionPeriod` to > WindowStoreBuilder: > > In another discussion we had around KIP-319 / 330, that the

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-09 Thread Guozhang Wang
Let me clarify a bit on what I meant about moving `retentionPeriod` to WindowStoreBuilder: In another discussion we had around KIP-319 / 330, that the "retention period" should not really be a window spec, but only a window store spec, as it only affects how long to retain each window to be

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-09 Thread John Roesler
Thanks for the reply, Guozhang, Good! I agree, that is also a good reason, and I actually made use of that in my tests. I'll update the KIP. By the way, I chose "allowedLateness" as I was trying to pick a better name than "close", but I think it's actually the wrong name. We don't want to bound

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-09 Thread Guozhang Wang
John, Thanks for your replies. As for the two options of the API, I think I'm slightly inclined to the first option as well. My motivation is a bit different, as I think of the first one maybe more flexible, for example: KTable> table = ... count(); table.toStream().peek(..); // want to peek

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-09 Thread John Roesler
Hey Matthias and Guozhang, Sorry for the slow reply. I was mulling about your feedback and weighing some ideas in a sketchbook PR: https://github.com/apache/kafka/pull/5337. Your thought about keeping suppression independent of business logic is a very good one. I agree that it would make more

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-06 Thread Guozhang Wang
I think I agree with Matthias for having dedicated APIs for windowed operation final output scenario, PLUS separating the window close which the "final output" would rely on, from the window retention time itself (admittedly it would make this KIP effort larger, but if we believe we need to do

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-04 Thread Matthias J. Sax
Thanks for the discussion. I am just catching up. In general, I think we have different uses cases and non-windowed and windowed is quite different. For the non-windowed case, suppress() has no (useful) close or retention time, no final semantics, and also no business logic impact. On the other

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-03 Thread John Roesler
Hi Guozhang, I see. It seems like if we want to decouple 1) and 2), we need to alter the definition of the window. Do you think it would close the gap if we added a "window close" time to the window definition? Such as: builder.stream("input") .groupByKey() .windowedBy( TimeWindows

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread Guozhang Wang
Hey John, Obviously I'm too lazy on email replying diligence compared with you :) Will try to reply them separately: - To reply your email on "Mon, Jul 2, 2018 at 8:23 AM": I'm aware of this use case, but again, the

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
In fact, to push the idea further (which IIRC is what Matthias originally proposed), if we can accept "Suppression#finalResultsOnly" in my last email, then we could also consider whether to eliminate "suppressLateEvents" entirely. We could always add it later, but you've both expressed doubt that

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
Hi again, Guozhang ;) Here's the second part of my response... It seems like your main concern is: "if I'm a user who wants final update semantics, how complicated is it for me to get it?" I think we have to assume that people don't always have time to become deeply familiar with all the nuances

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
Hi Guozhang, Thanks for the clarification. To answer your questions: 1. Yes, specifically Y < X makes sense and is by design. The scenario is to support IQ queries over windows that are closed but not evicted. For example, suppose we have a metrics application backed by Streams. Let's say we do

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-01 Thread Guozhang Wang
Hi John, Regarding the metrics: yeah I think I'm with you that the dropped records due to window retention or emit suppression policies should be recorded differently, and using this KIP's proposed metric would be fine. If you also think we can use this KIP's proposed metrics to cover the window

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-29 Thread Bill Bejeck
Thanks for the explanation, that does make sense. I have some questions on operations, but I'll just wait for the PR and tests. Thanks, Bill On Wed, Jun 27, 2018 at 8:14 PM John Roesler wrote: > Hi Bill, > > Thanks for the review! > > Your question is very much applicable to the KIP and not

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Hi Bill, Thanks for the review! Your question is very much applicable to the KIP and not at all an implementation detail. Thanks for bringing it up. I'm proposing not to change the existing caches and configurations at all (for now). Imagine you have a topology like this: commit.interval.ms =

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread Bill Bejeck
Hi John, thanks for the KIP. Early on in the KIP, you mention the current approaches for controlling the rate of downstream records from a KTable, cache size configuration and commit time. Will these configuration parameters still be in effect for tables that don't use suppression? For tables

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Thanks for the feedback, Matthias, It seems like in straightforward relational processing cases, it would not make sense to bound the lateness of KTables. In general, it seems better to have "guard rails" in place that make it easier to write sensible programs than insensible ones. But I'm still

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Hello again all, I realized today that I neglected to include metrics in the proposal. I have added them just now. Thanks, -John On Tue, Jun 26, 2018 at 3:11 PM John Roesler wrote: > Hello devs and users, > > Please take some time to consider this proposal for Kafka Streams: > > KIP-328:

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread Matthias J. Sax
Thanks for the KIP John. One initial comments about the last example "Bounded lateness": For a non-windowed KTable bounding the lateness does not really make sense, does it? Thus, I am wondering if we should allow `suppressLateEvents()` for this case? It seems to be better to only allow it for

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread Ted Yu
I noticed this (lack of primary parameter) as well. What you gave as new example is semantically the same as what I suggested. So it is good by me. Thanks On Wed, Jun 27, 2018 at 7:31 AM, John Roesler wrote: > Thanks for taking look, Ted, > > I agree this is a departure from the conventions

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Thanks for taking look, Ted, I agree this is a departure from the conventions of Streams DSL. Most of our config objects have one or two "required" parameters, which fit naturally with the static factory method approach. TimeWindow, for example, requires a size parameter, so we can naturally say

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-26 Thread Ted Yu
I started to read this KIP which contains a lot of materials. One suggestion: .suppress( new Suppression() Do you think it would be more consistent with the rest of Streams data structures by supporting `of` ? Suppression.of(Duration.ofMinutes(10)) Cheers On Tue, Jun 26, 2018

[DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-26 Thread John Roesler
Hello devs and users, Please take some time to consider this proposal for Kafka Streams: KIP-328: Ability to suppress updates for KTables link: https://cwiki.apache.org/confluence/x/sQU0BQ The basic idea is to provide: * more usable control over update rate (vs the current state store caches)