I am not sure if I fully understand what your question is? Are you talking about stream-table or table-table join?
For (1), why do you `merge()`? The merge operator is defined on KStreams, not KTable and a merge is also not a join? -Matthias On 7/15/20 3:27 AM, Dumitru-Nicolae Marasoui wrote: > Hello kafka community, > Writing black on white to be more visible, > This is a thought on making join more clear to me and less prone to > concurrency issues that would be risky, not knowing the underlying > implementation of join: > Waiting your feedback, > Thanks, > > 1. kafka streams 1: > map topic1 in: key: the key as the join key between topic1 and topic2; > value: topic1Or2UnionPayload, > map topic2 in key: the key as the join key between topic1 and topic2; > value: topic1Or2UnionPayload, > merge mapped topics above into a single stream; the keys are identical for > elements that need to be joined on both mapped topics/streams > write result to a new topic > > 2. kafka streams 2 (with join processor) > input the topic result from above with both mapped messages coming on a > single pipe, linearised by join key > groupBy join key > pipe each outcoming stream into a Processor that will store data for both > sides of the join and as soon as it has both will start emitting joined > records > write joined records to the result topic (the safe join result) > > On Wed, 15 Jul 2020 at 11:25, Dumitru-Nicolae Marasoui < > [email protected]> wrote: > >> Hello kafka community, >> Refining the step 2 and some questions: >> - is the indeterminism of the ktable join a real problem? >> - how is the ktable join implemented? >> - do you think the solution outlined is a step in the right direction? >> - does the ktable join implement such a strategy in a future version upon >> configuration/demand? >> Thank you, >> 1. kafka streams 1: >> >> - map topic1 in: key: the key as the join key between topic1 and >> topic2; value: topic1Or2UnionPayload, >> - map topic2 in key: the key as the join key between topic1 and >> topic2; value: topic1Or2UnionPayload, >> - merge mapped topics above into a single stream; the keys are >> identical for elements that need to be joined on both mapped >> topics/streams >> - write result to a new topic >> - >> >> 2. transformation part 2: (kafka kafka streams+processor): >> >> - input the topic result from above with both mapped messages coming >> on a single pipe, linearised by join key >> - groupBy join key >> - pipe each outcoming stream into a Processor that will store data for >> both sides of the join and as soon as it has both will start emitting >> joined records >> - write joined records to the result topic (the safe join result) >> - >> >> >> On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui < >> [email protected]> wrote: >> >>> Hello kafka community, >>> As I understand it, a kafka-streams join that involves a kTable: “the >>> KTable lookup is done on the current KTable state, and thus, out-of-order >>> records can yield non-deterministic result” [1] >>> >>> Does the solution below involving an intermediate topic sound right >>> to you? >>> 1. kafka streams 1: >>> >>> - map topic1 in: key: the key as the join key between topic1 and >>> topic2; value: topic1Or2UnionPayload, >>> - map topic2 in key: the key as the join key between topic1 and >>> topic2; value: topic1Or2UnionPayload, >>> - merge mapped topics above into a single stream; the keys are >>> identical for elements that need to be joined on both mapped >>> topics/streams >>> - write result to a new topic >>> - >>> >>> 2. transformation part 2: (kafka consumer/processor or kafka streams if a >>> suitable stateful transformation can be applied): >>> >>> - input the topic result from above with both mapped messages coming >>> on a single pipe, linearised by join key >>> - a state is need, likely a database (in case kafka-streams is >>> applicable, great, it already embeds one) >>> - when any two sides for the same join key are in the state, a pair >>> can be emitted downstream >>> - >>> >>> Does this make sense to you? Do you have any other experiences of >>> possible approaches that you would like to share? Thank you >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics >>> >>> >>> Dumitru-Nicolae Marasoui >>> >>> Software Engineer >>> >>> >>> >>> w kaluza.com <https://www.kaluza.com/> >>> >>> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter >>> <https://twitter.com/Kaluza_tech> >>> >>> Kaluza Ltd. registered in England and Wales No. 08785057 >>> >>> VAT No. 100119879 >>> >>> Help save paper - do you need to print this email? >>> >> >> >> -- >> >> Dumitru-Nicolae Marasoui >> >> Software Engineer >> >> >> >> w kaluza.com <https://www.kaluza.com/> >> >> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter >> <https://twitter.com/Kaluza_tech> >> >> Kaluza Ltd. registered in England and Wales No. 08785057 >> >> VAT No. 100119879 >> >> Help save paper - do you need to print this email? >> > >
signature.asc
Description: OpenPGP digital signature
