Hi All,

I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
CustomerID in which 2 RDDs have value of Iterable type and one has signle
bean. All RDDs have id of Long type as CustomerId. Below are the model for
3 RDDs:
JavaPairRDD<Long, Iterable<TransactionInfo>>
JavaPairRDD<Long, Iterable<TransactionRaw>>
JavaPairRDD<Long, TransactionAggr>

Now, i have to merge all these 3 RDDs as signle one so that i can generate
excel report as per each customer by using data in 3 RDDs.
As i tried to using Join Transformation but it needs RDDs of same type and
it works for two RDDs.
So my questions is,
1. is there any way to done my merging task efficiently, so that i can get
all 3 dataset by CustomerId?
2. If i merge 1st two using Join Transformation, then do i need to run
groupByKey() before Join so that all data related to single customer will
be on one node?


Thanks
Shams

Reply via email to