Actually, I am doing joining after map. I need to map the keys, modify them
and then do a join.

I was thinking of using always passing a partition key based on which
partition happens.
Step by step flow is:-
1. Data is already partitoned by do userid.
2. I do a map to joins impressions tied to a user with view notifications.
3. I count valid impressions across different aggregations(i.e. across diff
dimension groups).

Thanks,
-Sameer.

On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Two comments:
>
> 1) As long, as you don't do an aggregation/join after a map(), there
> will be not repartitioning. Streams does repartitioning "lazy", ie, only
> if it's required. As long as you only chain filter/map etc, no
> repartitioning will be done.
>
> 2) Can't you use mapValue() instead of map()? If you use map() to only
> read the key but only modify the value (-> "data is still local") a
> custom partitioner won't help. Also, we are improving this in upcoming
> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149
> for details).
>
> Hope this helps.
>
>
> -Matthias
>
> On 12/17/17 8:20 AM, Sameer Kumar wrote:
> > I have multiple map and filter phases in my application dag and though I
> am
> > generating different keys at different points, the data is still local.
> > Re-partitioning for me here is adding unnecessary network shuffling, I
> want
> > to minimize it.
> >
> > -Sameer.
> >
> > On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io>
> wrote:
> >
> >> It's not recommended to write a custom partitioner because it's pretty
> >> difficult to write a correct one. There are many dependencies and you
> >> need deep knowledge of Kafka Streams internals to get it write.
> >> Otherwise, your custom partitioner breaks Kafka Streams.
> >>
> >> That is the reason why it's not documented...
> >>
> >> Not sure so, what you try to achieve in the first place. What do you
> >> mean by
> >>
> >>> I want to make sure that during map phase, the keys
> >>>> produced adhere to the customized partitioner.
> >>
> >> Maybe you achieve what you want differently.
> >>
> >>
> >> -Matthias
> >>
> >> On 12/15/17 1:19 AM, Sameer Kumar wrote:
> >>> Hi,
> >>>
> >>> I want to use the custom partitioner in streams, I couldnt find the
> same
> >> in
> >>> the documentation. I want to make sure that during map phase, the keys
> >>> produced adhere to the customized partitioner.
> >>>
> >>> -Sameer.
> >>>
> >>
> >>
> >
>
>

Reply via email to