Actually, I am doing joining after map. I need to map the keys, modify them and then do a join.
I was thinking of using always passing a partition key based on which partition happens. Step by step flow is:- 1. Data is already partitoned by do userid. 2. I do a map to joins impressions tied to a user with view notifications. 3. I count valid impressions across different aggregations(i.e. across diff dimension groups). Thanks, -Sameer. On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Two comments: > > 1) As long, as you don't do an aggregation/join after a map(), there > will be not repartitioning. Streams does repartitioning "lazy", ie, only > if it's required. As long as you only chain filter/map etc, no > repartitioning will be done. > > 2) Can't you use mapValue() instead of map()? If you use map() to only > read the key but only modify the value (-> "data is still local") a > custom partitioner won't help. Also, we are improving this in upcoming > version 1.1 and allows read access to a key in mapValue() (cf. KIP-149 > for details). > > Hope this helps. > > > -Matthias > > On 12/17/17 8:20 AM, Sameer Kumar wrote: > > I have multiple map and filter phases in my application dag and though I > am > > generating different keys at different points, the data is still local. > > Re-partitioning for me here is adding unnecessary network shuffling, I > want > > to minimize it. > > > > -Sameer. > > > > On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io> > wrote: > > > >> It's not recommended to write a custom partitioner because it's pretty > >> difficult to write a correct one. There are many dependencies and you > >> need deep knowledge of Kafka Streams internals to get it write. > >> Otherwise, your custom partitioner breaks Kafka Streams. > >> > >> That is the reason why it's not documented... > >> > >> Not sure so, what you try to achieve in the first place. What do you > >> mean by > >> > >>> I want to make sure that during map phase, the keys > >>>> produced adhere to the customized partitioner. > >> > >> Maybe you achieve what you want differently. > >> > >> > >> -Matthias > >> > >> On 12/15/17 1:19 AM, Sameer Kumar wrote: > >>> Hi, > >>> > >>> I want to use the custom partitioner in streams, I couldnt find the > same > >> in > >>> the documentation. I want to make sure that during map phase, the keys > >>> produced adhere to the customized partitioner. > >>> > >>> -Sameer. > >>> > >> > >> > > > >