Kafka Direct Stream join without data shuffle

Chen Song Wed, 02 Sep 2015 13:06:25 -0700

I have a stream got from Kafka with direct approach, say, inputStream, I
need to


1. Create another DStream derivedStream with map or mapPartitions (with
some data enrichment with reference table) on inputStream
2. Join derivedStream with inputStream

In my use case, I don't need to shuffle data. Each partition in
derivedStream only needs to be joined with the corresponding partition in
the original parent inputStream it is generated from.

My question is

1. Is there a Partitioner defined in KafkaRDD at all?
2. How would I preserve the partitioning scheme and avoid data shuffle?

-- 
Chen Song

Kafka Direct Stream join without data shuffle

Reply via email to