sortByKey() is the probably the easiest way:
import org.apache.spark.SparkContext._
joinedRdd.map{case(word, (file1Counts, file2Counts)) => (file1Counts,
(word, file1Counts, file2Counts))}.sortByKey()
On Mon, Feb 23, 2015 at 10:41 AM, Anupama Joshi
wrote:
> Hi ,
> To simplify my problem -
> I
Hi ,
To simplify my problem -
I have 2 files from which I reading words.
the o/p is like
file 1
aaa 4
bbb 6
ddd 3
file 2
ddd 2
bbb 6
ttt 5
if I do file1.join(file2)
I get (ddd(3,2)
bbb(6,6)
If I want to sort the output by the number of occurances of the word i file1
. How do I achive th