But Kafka stream has underlyng RDD which consists of offsets reanges only-
so how does repartition works ?

1. First it evaluates the transformation and then repartition
2.or first it repartition and then transform. - In this case data should
not be transformed rather offset ranges only should be repartition and
shuffled.



On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> Yes not the offset ranges, but the real data will be shuffled when you
> using repartition().
>
> Thanks
> Saisai
>
> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> 1.Does repartitioning on direct kafka stream shuffles only the offsets or
>> exact kafka messages across executors?
>>
>> Say I have a direct kafkastream
>>
>> directKafkaStream.repartition(numexecutors).mapPartitions(new
>> FlatMapFunction<Iterator<Tuple2<byte[],byte[]>>, String>(){
>> ...
>> }
>>
>> Say originally I have 5*numexceutor partitons in kafka.
>>
>> Now only the offset ranges should be shuffled to executors not exact
>> kafka messages? But I am seeing a very large size of shuffles data
>> read/write on streaming ui. When I remove this repartition - shuffle read
>> /write becomes 0.
>>
>>
>

Reply via email to