Re: how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
Hi Romi, Yes, you understand it correctly.And rdd1 keys are cross with rdd2 keys, that is, there are lots of same keys between rdd1 and rdd2, and there are some keys inrdd1 but not in rdd2, there are also some keys in rdd2 but not in rdd1.Then rdd3 keys would be same with rdd1 keys, rdd3 will

how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without  any enough good solution...I shall appreciate your all kind help. There is an RDD rdd1, and another RDD rdd2, (rdd2 can be PairRDD, or DataFrame with two columns

Re: how to get RDD from two different RDDs with cross column

2015-09-21 Thread Romi Kuntsman
Hi, If I understand correctly: rdd1 contains keys (of type StringDate) rdd2 contains keys and values and rdd3 contains all the keys, and the values from rdd2? I think you should make rdd1 and rdd2 PairRDD, and then use outer join. Does that make sense? On Mon, Sep 21, 2015 at 8:37 PM Zhiliang