But Kafka stream has underlyng RDD which consists of offsets reanges only- so how does repartition works ?
1. First it evaluates the transformation and then repartition 2.or first it repartition and then transform. - In this case data should not be transformed rather offset ranges only should be repartition and shuffled. On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > Yes not the offset ranges, but the real data will be shuffled when you > using repartition(). > > Thanks > Saisai > > On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora <shushantaror...@gmail.com > > wrote: > >> 1.Does repartitioning on direct kafka stream shuffles only the offsets or >> exact kafka messages across executors? >> >> Say I have a direct kafkastream >> >> directKafkaStream.repartition(numexecutors).mapPartitions(new >> FlatMapFunction<Iterator<Tuple2<byte[],byte[]>>, String>(){ >> ... >> } >> >> Say originally I have 5*numexceutor partitons in kafka. >> >> Now only the offset ranges should be shuffled to executors not exact >> kafka messages? But I am seeing a very large size of shuffles data >> read/write on streaming ui. When I remove this repartition - shuffle read >> /write becomes 0. >> >> >