Re: Using Custom Partitioner in Streams

Sameer Kumar Mon, 18 Dec 2017 21:36:52 -0800

I understand it now, even if we are able to attach custom partitioning. The
data shall still travel from stream nodes to broker on join topic, so
travel to network will still be there.


-Sameer.

On Tue, Dec 19, 2017 at 1:17 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> > need to map the keys, modify them
> >> and then do a join.
>
> This will always trigger a rebalance. There is no API atm to tell KS
> that partitioning is preserved.
>
> Custom partitioner won't help for your case as far as I understand it.
>
>
> -Matthias
>
> On 12/17/17 9:48 PM, Sameer Kumar wrote:
> > Actually, I am doing joining after map. I need to map the keys, modify
> them
> > and then do a join.
> >
> > I was thinking of using always passing a partition key based on which
> > partition happens.
> > Step by step flow is:-
> > 1. Data is already partitoned by do userid.
> > 2. I do a map to joins impressions tied to a user with view
> notifications.
> > 3. I count valid impressions across different aggregations(i.e. across
> diff
> > dimension groups).
> >
> > Thanks,
> > -Sameer.
> >
> > On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> >> Two comments:
> >>
> >> 1) As long, as you don't do an aggregation/join after a map(), there
> >> will be not repartitioning. Streams does repartitioning "lazy", ie, only
> >> if it's required. As long as you only chain filter/map etc, no
> >> repartitioning will be done.
> >>
> >> 2) Can't you use mapValue() instead of map()? If you use map() to only
> >> read the key but only modify the value (-> "data is still local") a
> >> custom partitioner won't help. Also, we are improving this in upcoming
> >> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149
> >> for details).
> >>
> >> Hope this helps.
> >>
> >>
> >> -Matthias
> >>
> >> On 12/17/17 8:20 AM, Sameer Kumar wrote:
> >>> I have multiple map and filter phases in my application dag and though
> I
> >> am
> >>> generating different keys at different points, the data is still local.
> >>> Re-partitioning for me here is adding unnecessary network shuffling, I
> >> want
> >>> to minimize it.
> >>>
> >>> -Sameer.
> >>>
> >>> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io>
> >> wrote:
> >>>
> >>>> It's not recommended to write a custom partitioner because it's pretty
> >>>> difficult to write a correct one. There are many dependencies and you
> >>>> need deep knowledge of Kafka Streams internals to get it write.
> >>>> Otherwise, your custom partitioner breaks Kafka Streams.
> >>>>
> >>>> That is the reason why it's not documented...
> >>>>
> >>>> Not sure so, what you try to achieve in the first place. What do you
> >>>> mean by
> >>>>
> >>>>> I want to make sure that during map phase, the keys
> >>>>>> produced adhere to the customized partitioner.
> >>>>
> >>>> Maybe you achieve what you want differently.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 12/15/17 1:19 AM, Sameer Kumar wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I want to use the custom partitioner in streams, I couldnt find the
> >> same
> >>>> in
> >>>>> the documentation. I want to make sure that during map phase, the
> keys
> >>>>> produced adhere to the customized partitioner.
> >>>>>
> >>>>> -Sameer.
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Using Custom Partitioner in Streams

Reply via email to