Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Guozhang Wang
Thanks for sharing Jan. I think it would help if you are share a sketch of your code snippet for illustrating the implementation. As for the recent development, assuming you are referring to KIP-120 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API),

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Hi, yeah if the proposed solution is doable (only constrain really is to not have a parent key with lots of children) completly in the DSL except the lateral view wich is a pretty easy thing in PAPI. Our own implementation is a mix of reusing DSL interfaces but using reflection against KTabl

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
+1 on seeing what Jan did, I'm interested too. Thanks Eno > On 21 Feb 2017, at 19:15, Guozhang Wang wrote: > > Jan, > > Sure I would love to hear what you did for non-key joins. Last time we > chatted there are discussions on the ordering issue, that we HAVE TO > augment the join result stream

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Guozhang Wang
Jan, Sure I would love to hear what you did for non-key joins. Last time we chatted there are discussions on the ordering issue, that we HAVE TO augment the join result stream keys as a combo of both, which may not be elegant as used in the DSL. For your proposed solution, it seems you did not do

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Just a little note here: if you can take all rows of the "children" table for each key into memory, you get get away by using group_by and make a list of them. With this aggregation the join is straight forward and you can use a lateral view later to get to the same result. For this you could

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Hi, yes the ticket is exactly about what you want to do. The lengthy discussion is mainly about what the key of the output KTable is. @gouzhang would you be interested in seeing what we did so far? best Jan On 21.02.2017 13:10, Frank Lyaruu wrote: I've read that JIRA (although I don't under

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Frank Lyaruu
I've read that JIRA (although I don't understand every single thing), and I got the feeling it is not exactly the same problem. I am aware of the Global Tables, and I've tried that first, but I seem unable to do what I need to do. I'm replicating a relational database, and on a one-to-many relatio

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
Hi Frank, As far as I know the design in that wiki has been superceded by the Global KTables design which is now coming in 0.10.2. Hence, the JIRAs that are mentioned there (like KAFKA-3705). There are some extensive comments in https://issues.apache.org/jira/browse/KAFKA-3705

Implementing a non-key in Kafka Streams using the Processor API

2017-02-20 Thread Frank Lyaruu
Hi all, I'm trying to implement joining two Kafka tables using a 'remote' key, basically as described here: https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Non-key+KTable-KTable+Joins Under the "Implementation Details" there is one line I don't know how to do: 1. First of al