merge join already sorted data?

Ken Geis Thu, 25 Feb 2016 21:23:30 -0800

I am loading data from two different databases and joining it in Spark. The
data is indexed in the database, so it is efficient to retrieve the data
ordered by a key. Can I tell Spark that my data is coming in ordered on
that key so that when I join the data sets, they will be joined with little
shuffling via a merge join?


I know that Flink supports this, but its JDBC support is pretty lacking in
general.


Thanks,

Ken

merge join already sorted data?

Reply via email to