Spark joins using row id

Rohit Verma Sat, 12 Nov 2016 03:12:08 -0800

For datasets structured as 

ds1
rowN col1
1       A
2       B
3       C
4       C
…


and

ds2
rowN col2
1       X
2       Y
3       Z
…

I want to do a left join 

Dataset<Row> joined = ds1.join(ds2,”rowN”,”left outer”);

I somewhere read in SO or this mailing list that if spark is aware of datasets 
being sorted it will use some optimizations for joins.
Is it possible to make this join more efficient/faster.

Rohit

Spark joins using row id

Reply via email to