I am loading data from two different databases and joining it in Spark. The
data is indexed in the database, so it is efficient to retrieve the data
ordered by a key. Can I tell Spark that my data is coming in ordered on
that key so that when I join the data sets, they will be joined with little
shuffling via a merge join?

I know that Flink supports this, but its JDBC support is pretty lacking in
general.


Thanks,

Ken

Reply via email to