I have multiple RDD[(String, String)] that store (docId, docText) pairs, e.g.
rdd1: ("id1", "Long text 1"), ("id2", "Long text 2"), ("id3", "Long text 3")
rdd2: ("id1", "Long text 1 A"), ("id2", "Long text 2 A")
rdd3: ("id1", "Long text 1 B")
Then, I want to merge all RDDs. If there is duplicated docids, later RDD should
overwrite previous RDD. In the above case, rdd2 will overwrite rddd1 for "id1"
and "id2", then rdd3 will overwrite rdd2 for "id1". The final merged rdd should
be
rddFinal: ("id1", "Long text 1 B"), ("id2", "Long text 2 A"), ("id3", "Long
text 3")
Note that I have many such RDDs and each rdd have lots of elements. How can I
do it efficiently?
Ningjun