Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-19 Thread Guozhang Wang
Avi, I have granted you the permissions under apache id (avi). Guozhang On Thu, Dec 15, 2016 at 3:40 PM, Matthias J. Sax wrote: > What is you wiki ID? We can grant you permission. > > -Matthias > > On 12/15/16 3:27 PM, Avi Flax wrote: > > > >> On Dec 13, 2016, at 21:02,

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-15 Thread Matthias J. Sax
What is you wiki ID? We can grant you permission. -Matthias On 12/15/16 3:27 PM, Avi Flax wrote: > >> On Dec 13, 2016, at 21:02, Matthias J. Sax wrote: >> >> thanks for your feedback. > > My pleasure! > >> We want to enlarge the scope for Streams >> application and

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-15 Thread Avi Flax
> On Dec 13, 2016, at 21:02, Matthias J. Sax wrote: > > thanks for your feedback. My pleasure! > We want to enlarge the scope for Streams > application and started to collect use cases in the Wiki: > >

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-13 Thread Matthias J. Sax
Avi, thanks for your feedback. We want to enlarge the scope for Streams application and started to collect use cases in the Wiki: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios Feel free to add there via editing the page or writing a comment.

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-13 Thread Avi Flax
On 2016-11-28 13:47 (-0500), "Matthias J. Sax" wrote: > > I want to start a discussion about KIP-95: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams > > Looking forward to your feedback. Hi Matthias, I’d just

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-10 Thread Ewen Cheslack-Postava
A few thoughts: > introduce a Streams metadata topic (single partitioned and log compacted; one for each application-ID) I would consider whether the single partition is a strict requirement. A single partition increases the likelihood of outages. This is something I'd like to see changed in

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-09 Thread Matthias J. Sax
About using offset-topic metadata field: Even with KIP-98, I think this would not work (or maybe in a weird way). If we have the following topology: topic1 -> subTopologyA -> topic2 -> subTopologyB If producer of subTopologyA commits, it will commit its input offsets from topic1. Thus, the

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-09 Thread Guozhang Wang
I will read through the KIP doc once again to provide more detailed feedbacks, but let me through my two cents just for the above email. There are a few motivations to have a "consistent" stop-point across tasks belonging to different sub-topologies. One of them is for interactive queries: say

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-06 Thread Matthias J. Sax
Thanks for the input Jay. From my understanding, your question boils down to how fuzzy the stop point can/should be, and what guarantees we want to provide to the user. This has multiple dimension: 1. Using a timestamp has the main issue, that if an input topic has no data with this timestamp,

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-05 Thread Jay Kreps
I'd like to second the discouragement of adding a new topic per job. We went down this path in Samza and I think the result was quite a mess. You had to read the full topic every time a job started and so it added a lot of overhead and polluted the topic space. What if we did the following:

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-05 Thread Matthias J. Sax
1) The change to consume the metadata topic by all instances should not be big. We want to leverage the "restore consumer" that does manual partitions assignments already. 2) I understand your concern about adding the metadata topic... For a single instance with one thread, it would be easy to go

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-04 Thread Eno Thereska
A couple of remaining questions: - it says in the KIP: "because the metadata topic must be consumed by all instances, we need to assign the topic’s partitions manually and do not commit offsets -- we also need to seekToBeginning() each time we consume the metadata topic)" . How big of a change

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-01 Thread Michael Noll
> Instead, why would we not have batch.mode=true/false? Matthias already replied in detail, but let me also throw in my thoughts: I'd argue that "there is no batch [mode]" (just like "there is no spoon" for the Matrix fans out there :-P). What would a batch mode look like? It is primarily

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Matthias J. Sax
Right now, there is only one config value and I am open to better suggestions. We did not go for batch.mode=true/false because we might want to have auto stop at specific stop-offsets or stop-timestamp later on. So we can extend the parameter with new values like autostop.at=timestamp in

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Sriram Subramanian
I agree that the metadata topic is required to build a batching semantic that is intuitive. One question on the config - autostop.at I see one value for it - eol. What other values can be used? Instead, why would we not have batch.mode=true/false? On Wed, Nov 30, 2016 at 1:51 PM, Matthias J.

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Matthias J. Sax
Both types of intermediate topics are handled the exact same way and both types do connect different subtopologies (even if the user might not be aware that there are multiple subtopologies in case of internal data repartitioning). So there is no distinction between user intermediate topics (via

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Eno Thereska
With the marker we have substituted a failure problem for a liveness problem in that under repeated failure the other instances will not do any useful work. As mentioned in the other email, I don't know if we need to worry about that corner case just yet. Eno > On 30 Nov 2016, at 20:06,

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Eno Thereska
In the KIP, two types of intermediate topics are described, 1) ones that connect two sub-topologies, and 2) others that are internal repartitioning topics (e.g., for joins). I wasn't envisioning stopping the consumption of (2) at the HWM. The HWM can be used for the source topics only (so I

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Matthias J. Sax
Eno, > So in general, we have the problem of failures during writes to the metadata topic itself. The KIP suggests to use marker messaged for this case. The marker is either written (indicating success) or not. If not, after failure/rebalance the (new) group leader will collect HW again. As long

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Damian Guy
I think the KIP looks good. I also think we need the metadata topic in-order to provide sane guarantees on what data will be processed. As Matthias has outlined in the KIP we need to know when to stop consuming from intermediate topics, i.e, topics that are part of the same application but are

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-30 Thread Eno Thereska
Hi Matthias, I like the first part of the KIP. However, the second part with the failure modes and metadata topic is quite complex and I'm worried it doesn't solve the problems you mention under failure. For example, the application can fail before writing to the metadata topic. In that case,

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-29 Thread Matthias J. Sax
Thanks for your input. To clarify: the main reason to add the metadata topic is to cope with subtopologies that are connected via intermediate topic (either user-defined via through() or internally created for data repartitioning). Without this handling, the behavior would be odd and user

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-29 Thread Neha Narkhede
Thanks for initiating this. I think this is a good first step towards unifying batch and stream processing in Kafka. I understood this capability to be simple yet very useful; it allows a Streams program to process a log, in batch, in arbitrary windows defined by the difference between the HW and

[DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-11-28 Thread Matthias J. Sax
Hi all, I want to start a discussion about KIP-95: https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams Looking forward to your feedback. -Matthias signature.asc Description: OpenPGP digital signature