Re: Kafka Streams: finding a solution to a particular use case

Guillermo Lammers Corral Tue, 19 Apr 2016 08:58:54 -0700

Hello,

Thanks again for your reply :)


1) In my example when I send a record from outer table and there is no
matching record from inner table I receive data to the output topic and
vice versa. I am trying it with the topics empties at the first execution.
How is possible?

Why KTable joins does not support windowing strategies? I think that for
this use cases I need it, what do you think?

2) What does it means? Although the log may not be yet compacted, there
should be no problem to read from them and execute a new stream process,
right? (like a new joins, counts...).

Thanks!!

2016-04-15 17:37 GMT+02:00 Guozhang Wang <wangg...@gmail.com>:

> 1) There are three types of joins for KTable-KTable join, the follow the
> same semantics in SQL joins:
>
> KTable.join(KTable): when there is no matching record from inner table when
> received a new record from outer table, no output; and vice versa.
> KTable.leftjoin(KTable): when there is no matching record from inner table
> when received a new record from outer table, output (a, null); on the other
> direction no output.
> KTable.outerjoin(KTable): when there is no matching record from inner /
> outer table when received a new record from outer / inner table, output (a,
> null) or (null, b).
>
>
> 2) The result topic is also a changelog topic, although it will be log
> compacted on the key over time, if you consume immediately the log may not
> be yet compacted.
>
>
> Guozhang
>
> On Fri, Apr 15, 2016 at 2:11 AM, Guillermo Lammers Corral <
> guillermo.lammers.cor...@tecsisa.com> wrote:
>
> > Hi Guozhang,
> >
> > Thank you very much for your reply and sorry for the generic question,
> I'll
> > try to explain with some pseudocode.
> >
> > I have two KTable with a join:
> >
> > ktable1: KTable[String, String] = builder.table("topic1")
> > ktable2: KTable[String, String] = builder.table("topic2")
> >
> > result: KTable[String, ResultUnion] =
> > ktable1.join(ktable2, (data1, data2) => new ResultUnion(data1, data2))
> >
> > I send the result to a topic result.to("resultTopic").
> >
> > My questions are related with the following scenario:
> >
> > - The streming is up & running without data in topics
> >
> > - I send data to "topic2", for example a key/value like that
> ("uniqueKey1",
> > "hello")
> >
> > - I see null values in topic "resultTopic", i.e. ("uniqueKey1", null)
> >
> > - If I send data to "topic1", for example a key/value like that
> > ("uniqueKey1", "world") then I see this values in topic "resultTopic",
> > ("uniqueKey1", ResultUnion("hello", "world"))
> >
> > Q: If we send data for one of the KTable that does not have the
> > corresponding data by key in the other one, obtain null values in the
> > result final topic is the expected behavior?
> >
> > My next step would be use Kafka Connect to persist result data in C* (I
> > have not read yet the Connector docs...), is this the way to do it? (I
> mean
> > prepare the data in the topic).
> >
> > Q: On the other hand, just to try, I have a KTable that read messages in
> > "resultTopic" and prints them. If the stream is a KTable I am wondering
> why
> > is getting all the values from the topic even those with the same key?
> >
> > Thanks in advance! Great job answering community!
> >
> > 2016-04-14 20:00 GMT+02:00 Guozhang Wang <wangg...@gmail.com>:
> >
> > > Hi Guillermo,
> > >
> > > 1) Yes in your case, the streams are really a "changelog" stream, hence
> > you
> > > should create the stream as KTable, and do KTable-KTable join.
> > >
> > > 2) Could elaborate about "achieving this"? What behavior do require in
> > the
> > > application logic?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Thu, Apr 14, 2016 at 1:30 AM, Guillermo Lammers Corral <
> > > guillermo.lammers.cor...@tecsisa.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am a newbie to Kafka Streams and I am using it trying to solve a
> > > > particular use case. Let me explain.
> > > >
> > > > I have two sources of data both like that:
> > > >
> > > > Key (string)
> > > > DateTime (hourly granularity)
> > > > Value
> > > >
> > > > I need to join the two sources by key and date (hour of day) to
> obtain:
> > > >
> > > > Key (string)
> > > > DateTime (hourly granularity)
> > > > ValueSource1
> > > > ValueSource2
> > > >
> > > > I think that first I'd need to push the messages in Kafka topics with
> > the
> > > > date as part of the key because I'll group by key taking into account
> > the
> > > > date. So maybe the key must be a new string like key_timestamp. But,
> of
> > > > course, it is not the main problem, is just an additional
> explanation.
> > > >
> > > > Ok, so data are in topics, here we go!
> > > >
> > > > - Multiple records allows per key but only the latest value for a
> > record
> > > > key will be considered. I should use two KTable with some join
> > strategy,
> > > > right?
> > > >
> > > > - Data of both sources could arrive at any time. What can I do to
> > achieve
> > > > this?
> > > >
> > > > Thanks in advance.
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Kafka Streams: finding a solution to a particular use case

Reply via email to