Hi, Never done it before, but just yesterday I found out about SparkContext.union method that could help in your case.
def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in> wrote: > Hi All, > > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by > CustomerID in which 2 RDDs have value of Iterable type and one has signle > bean. All RDDs have id of Long type as CustomerId. Below are the model for 3 > RDDs: > JavaPairRDD<Long, Iterable<TransactionInfo>> > JavaPairRDD<Long, Iterable<TransactionRaw>> > JavaPairRDD<Long, TransactionAggr> > > Now, i have to merge all these 3 RDDs as signle one so that i can generate > excel report as per each customer by using data in 3 RDDs. > As i tried to using Join Transformation but it needs RDDs of same type and > it works for two RDDs. > So my questions is, > 1. is there any way to done my merging task efficiently, so that i can get > all 3 dataset by CustomerId? > 2. If i merge 1st two using Join Transformation, then do i need to run > groupByKey() before Join so that all data related to single customer will be > on one node? > > > Thanks > Shams --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org