Two comments:

1) As long, as you don't do an aggregation/join after a map(), there
will be not repartitioning. Streams does repartitioning "lazy", ie, only
if it's required. As long as you only chain filter/map etc, no
repartitioning will be done.

2) Can't you use mapValue() instead of map()? If you use map() to only
read the key but only modify the value (-> "data is still local") a
custom partitioner won't help. Also, we are improving this in upcoming
version 1.1 and allows read access to a key in mapValue() (cf. KIP-149
for details).

Hope this helps.


-Matthias

On 12/17/17 8:20 AM, Sameer Kumar wrote:
> I have multiple map and filter phases in my application dag and though I am
> generating different keys at different points, the data is still local.
> Re-partitioning for me here is adding unnecessary network shuffling, I want
> to minimize it.
> 
> -Sameer.
> 
> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io> wrote:
> 
>> It's not recommended to write a custom partitioner because it's pretty
>> difficult to write a correct one. There are many dependencies and you
>> need deep knowledge of Kafka Streams internals to get it write.
>> Otherwise, your custom partitioner breaks Kafka Streams.
>>
>> That is the reason why it's not documented...
>>
>> Not sure so, what you try to achieve in the first place. What do you
>> mean by
>>
>>> I want to make sure that during map phase, the keys
>>>> produced adhere to the customized partitioner.
>>
>> Maybe you achieve what you want differently.
>>
>>
>> -Matthias
>>
>> On 12/15/17 1:19 AM, Sameer Kumar wrote:
>>> Hi,
>>>
>>> I want to use the custom partitioner in streams, I couldnt find the same
>> in
>>> the documentation. I want to make sure that during map phase, the keys
>>> produced adhere to the customized partitioner.
>>>
>>> -Sameer.
>>>
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to