I am loading data from two different databases and joining it in Spark. The data is indexed in the database, so it is efficient to retrieve the data ordered by a key. Can I tell Spark that my data is coming in ordered on that key so that when I join the data sets, they will be joined with little shuffling via a merge join?
I know that Flink supports this, but its JDBC support is pretty lacking in general. Thanks, Ken
