I am not sure if I fully understand what your question is?

Are you talking about stream-table or table-table join?

For (1), why do you `merge()`? The merge operator is defined on
KStreams, not KTable and a merge is also not a join?


-Matthias


On 7/15/20 3:27 AM, Dumitru-Nicolae Marasoui wrote:
> Hello kafka community,
> Writing black on white to be more visible,
> This is a thought on making join more clear to me and less prone to
> concurrency issues that would be risky, not knowing the underlying
> implementation of join:
> Waiting your feedback,
> Thanks,
> 
> 1. kafka streams 1:
> map topic1 in: key: the key as the join key between topic1 and topic2;
> value: topic1Or2UnionPayload,
> map topic2 in key: the key as the join key between topic1 and topic2;
> value: topic1Or2UnionPayload,
> merge mapped topics above into a single stream; the keys are identical for
> elements that need to be joined on both mapped topics/streams
> write result to a new topic
> 
> 2. kafka streams 2 (with join processor)
> input the topic result from above with both mapped messages coming on a
> single pipe, linearised by join key
> groupBy join key
> pipe each outcoming stream into a Processor that will store data for both
> sides of the join and as soon as it has both will start emitting joined
> records
> write joined records to the result topic (the safe join result)
> 
> On Wed, 15 Jul 2020 at 11:25, Dumitru-Nicolae Marasoui <
> [email protected]> wrote:
> 
>> Hello kafka community,
>> Refining the step 2 and some questions:
>> - is the indeterminism of the ktable join a real problem?
>> - how is the ktable join implemented?
>> - do you think the solution outlined is a step in the right direction?
>> - does the ktable join implement such a strategy in a future version upon
>> configuration/demand?
>> Thank you,
>> 1. kafka streams 1:
>>
>>    - map topic1 in: key: the key as the join key between topic1 and
>>    topic2; value: topic1Or2UnionPayload,
>>    - map topic2 in key: the key as the join key between topic1 and
>>    topic2; value: topic1Or2UnionPayload,
>>    - merge mapped topics above into a single stream; the keys are
>>    identical for elements that need to be joined on both mapped 
>> topics/streams
>>    - write result to a new topic
>>    -
>>
>> 2. transformation part 2: (kafka kafka streams+processor):
>>
>>    - input the topic result from above with both mapped messages coming
>>    on a single pipe, linearised by join key
>>    - groupBy join key
>>    - pipe each outcoming stream into a Processor that will store data for
>>    both sides of the join and as soon as it has both will start emitting
>>    joined records
>>    - write joined records to the result topic (the safe join result)
>>    -
>>
>>
>> On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui <
>> [email protected]> wrote:
>>
>>> Hello kafka community,
>>> As I understand it, a kafka-streams join that involves a kTable: “the
>>> KTable lookup is done on the current KTable state, and thus, out-of-order
>>> records can yield non-deterministic result” [1]
>>>
>>> Does the solution below involving an intermediate topic sound right
>>> to you?
>>> 1. kafka streams 1:
>>>
>>>    - map topic1 in: key: the key as the join key between topic1 and
>>>    topic2; value: topic1Or2UnionPayload,
>>>    - map topic2 in key: the key as the join key between topic1 and
>>>    topic2; value: topic1Or2UnionPayload,
>>>    - merge mapped topics above into a single stream; the keys are
>>>    identical for elements that need to be joined on both mapped 
>>> topics/streams
>>>    - write result to a new topic
>>>    -
>>>
>>> 2. transformation part 2: (kafka consumer/processor or kafka streams if a
>>> suitable stateful transformation can be applied):
>>>
>>>    - input the topic result from above with both mapped messages coming
>>>    on a single pipe, linearised by join key
>>>    - a state is need, likely a database (in case kafka-streams is
>>>    applicable, great, it already embeds one)
>>>    - when any two sides for the same join key are in the state, a pair
>>>    can be emitted downstream
>>>    -
>>>
>>> Does this make sense to you? Do you have any other experiences of
>>> possible approaches that you would like to share? Thank you
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics
>>>
>>>
>>> Dumitru-Nicolae Marasoui
>>>
>>> Software Engineer
>>>
>>>
>>>
>>> w kaluza.com <https://www.kaluza.com/>
>>>
>>> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
>>> <https://twitter.com/Kaluza_tech>
>>>
>>> Kaluza Ltd. registered in England and Wales No. 08785057
>>>
>>> VAT No. 100119879
>>>
>>> Help save paper - do you need to print this email?
>>>
>>
>>
>> --
>>
>> Dumitru-Nicolae Marasoui
>>
>> Software Engineer
>>
>>
>>
>> w kaluza.com <https://www.kaluza.com/>
>>
>> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
>> <https://twitter.com/Kaluza_tech>
>>
>> Kaluza Ltd. registered in England and Wales No. 08785057
>>
>> VAT No. 100119879
>>
>> Help save paper - do you need to print this email?
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to