For datasets structured as ds1 rowN col1 1 A 2 B 3 C 4 C …
and ds2 rowN col2 1 X 2 Y 3 Z … I want to do a left join Dataset<Row> joined = ds1.join(ds2,”rowN”,”left outer”); I somewhere read in SO or this mailing list that if spark is aware of datasets being sorted it will use some optimizations for joins. Is it possible to make this join more efficient/faster. Rohit