Kafka Direct Stream join without data shuffle

2015-09-02 Thread Chen Song
I have a stream got from Kafka with direct approach, say, inputStream, I need to 1. Create another DStream derivedStream with map or mapPartitions (with some data enrichment with reference table) on inputStream 2. Join derivedStream with inputStream In my use case, I don't need to shuffle data.

Re: Kafka Direct Stream join without data shuffle

2015-09-02 Thread Cody Koeninger
No, there isn't a partitioner for KafkaRDD (KafkaRDD may not even be a pair rdd, for instance). It sounds to me like if it's a self-join, you should be able to do it in a single mapPartition operation. On Wed, Sep 2, 2015 at 3:06 PM, Chen Song wrote: > I have a stream