Hi,

Never done it before, but just yesterday I found out about
SparkContext.union method that could help in your case.

def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T]

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext

Pozdrawiam,
Jacek

--
Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
http://blog.jaceklaskowski.pl
Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski


On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in> wrote:
> Hi All,
>
> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
> CustomerID in which 2 RDDs have value of Iterable type and one has signle
> bean. All RDDs have id of Long type as CustomerId. Below are the model for 3
> RDDs:
> JavaPairRDD<Long, Iterable<TransactionInfo>>
> JavaPairRDD<Long, Iterable<TransactionRaw>>
> JavaPairRDD<Long, TransactionAggr>
>
> Now, i have to merge all these 3 RDDs as signle one so that i can generate
> excel report as per each customer by using data in 3 RDDs.
> As i tried to using Join Transformation but it needs RDDs of same type and
> it works for two RDDs.
> So my questions is,
> 1. is there any way to done my merging task efficiently, so that i can get
> all 3 dataset by CustomerId?
> 2. If i merge 1st two using Join Transformation, then do i need to run
> groupByKey() before Join so that all data related to single customer will be
> on one node?
>
>
> Thanks
> Shams

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to