I'm using groupByKey, and it causes repartitioning.

I suppose I could aggregate by parent ID, if the data structure into which I 
aggregate by parent ID is itself a map from child ID to what I'm really wanting 
to aggregate - is that what you had in mind? - I think it would work!

Give or take a problem I've discovered with persistence following a crash in 
the middle of aggregation, which I'll post separately.

Tim Ward

-----Original Message-----
From: Boyang Chen <reluctanthero...@gmail.com>
Sent: 09 August 2019 23:31
To: users@kafka.apache.org
Subject: Re: How do I tell Kafka Streams not to repartition?

In case I'm not making myself clear, any operation that changes the record
key will result in repartition. Since you don't want that, you shall choose
to call groupByKey afterwards and aggregation will happen on `parent id`
level.

On Fri, Aug 9, 2019 at 3:27 PM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Hey Tim,
>
> I think the functionality you need is groupByKey() which avoids
> repartitioning, feel free to check it out here:
> https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#aggregating.
> Recommend you to read the whole thing but feel free just to search
> `groupByKey`.
>
> On Fri, Aug 9, 2019 at 7:14 AM Tim Ward <tim.w...@origamienergy.com>
> wrote:
>
>> I've got an input topic which is keyed by "parent ID". Each message
>> contains multiple items of data, each for a different "child ID".
>>
>> To process these items separately I flatMapValues() the stream to make a
>> new stream of the inner items of data, keyed by "child ID".
>>
>> Now, because I've changed the key, Kafka Streams thinks a repartition is
>> needed. But in fact it isn't, because all the inner items for a particular
>> "child ID" will be contained within messages keyed with the same "parent
>> ID".
>>
>> How do I tell Kafka Streams that there is no need to repartition in this
>> case, because all the data that should remain together in the same instance
>> of the application will do so without repartitioning? (I appreciate that
>> Streams can't know about the parent-child relationship unless I *do* tell
>> it in some way.)
>>
>> Tim Ward
>>
>> This email is from Origami Energy Limited. The contents of this email and
>> any attachment are confidential to the intended recipient(s). If you are
>> not an intended recipient: (i) do not use, disclose, distribute, copy or
>> publish this email or its contents; (ii) please contact Origami Energy
>> Limited immediately; and then (iii) delete this email. For more
>> information, our privacy policy is available here:
>> https://origamienergy.com/privacy-policy/. Origami Energy Limited
>> (company number 8619644) is a company registered in England with its
>> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>>
>
This email is from Origami Energy Limited. The contents of this email and any 
attachment are confidential to the intended recipient(s). If you are not an 
intended recipient: (i) do not use, disclose, distribute, copy or publish this 
email or its contents; (ii) please contact Origami Energy Limited immediately; 
and then (iii) delete this email. For more information, our privacy policy is 
available here: https://origamienergy.com/privacy-policy/. Origami Energy 
Limited (company number 8619644) is a company registered in England with its 
registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

Reply via email to