> need to map the keys, modify them >> and then do a join. This will always trigger a rebalance. There is no API atm to tell KS that partitioning is preserved.
Custom partitioner won't help for your case as far as I understand it. -Matthias On 12/17/17 9:48 PM, Sameer Kumar wrote: > Actually, I am doing joining after map. I need to map the keys, modify them > and then do a join. > > I was thinking of using always passing a partition key based on which > partition happens. > Step by step flow is:- > 1. Data is already partitoned by do userid. > 2. I do a map to joins impressions tied to a user with view notifications. > 3. I count valid impressions across different aggregations(i.e. across diff > dimension groups). > > Thanks, > -Sameer. > > On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> Two comments: >> >> 1) As long, as you don't do an aggregation/join after a map(), there >> will be not repartitioning. Streams does repartitioning "lazy", ie, only >> if it's required. As long as you only chain filter/map etc, no >> repartitioning will be done. >> >> 2) Can't you use mapValue() instead of map()? If you use map() to only >> read the key but only modify the value (-> "data is still local") a >> custom partitioner won't help. Also, we are improving this in upcoming >> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149 >> for details). >> >> Hope this helps. >> >> >> -Matthias >> >> On 12/17/17 8:20 AM, Sameer Kumar wrote: >>> I have multiple map and filter phases in my application dag and though I >> am >>> generating different keys at different points, the data is still local. >>> Re-partitioning for me here is adding unnecessary network shuffling, I >> want >>> to minimize it. >>> >>> -Sameer. >>> >>> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io> >> wrote: >>> >>>> It's not recommended to write a custom partitioner because it's pretty >>>> difficult to write a correct one. There are many dependencies and you >>>> need deep knowledge of Kafka Streams internals to get it write. >>>> Otherwise, your custom partitioner breaks Kafka Streams. >>>> >>>> That is the reason why it's not documented... >>>> >>>> Not sure so, what you try to achieve in the first place. What do you >>>> mean by >>>> >>>>> I want to make sure that during map phase, the keys >>>>>> produced adhere to the customized partitioner. >>>> >>>> Maybe you achieve what you want differently. >>>> >>>> >>>> -Matthias >>>> >>>> On 12/15/17 1:19 AM, Sameer Kumar wrote: >>>>> Hi, >>>>> >>>>> I want to use the custom partitioner in streams, I couldnt find the >> same >>>> in >>>>> the documentation. I want to make sure that during map phase, the keys >>>>> produced adhere to the customized partitioner. >>>>> >>>>> -Sameer. >>>>> >>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature