I have a stream got from Kafka with direct approach, say, inputStream, I
need to
1. Create another DStream derivedStream with map or mapPartitions (with
some data enrichment with reference table) on inputStream
2. Join derivedStream with inputStream
In my use case, I don't need to shuffle data.
No, there isn't a partitioner for KafkaRDD (KafkaRDD may not even be a pair
rdd, for instance).
It sounds to me like if it's a self-join, you should be able to do it in a
single mapPartition operation.
On Wed, Sep 2, 2015 at 3:06 PM, Chen Song wrote:
> I have a stream