This all depends if you provide information to Driver on the underlying RDBMS table and assuming that there is a unique ID on the underlying table you can use to partition the load.
Have a look at this http://metricbrew.com/get-data-from-databases-with-apache-spark-jdbc/ HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 4 April 2016 at 09:09, Zhang, Jingyu <jingyu.zh...@news.com.au> wrote: > Hi All, > > I want read Mysql from Spark. Please let me know how many threads will be > used to read the RDBMS after set numPartitions =10 in Spark JDBC. What is > the best practice to read large dataset from RDBMS to Spark? > > Thanks, > > Jingyu > > This message and its attachments may contain legally privileged or > confidential information. It is intended solely for the named addressee. If > you are not the addressee indicated in this message or responsible for > delivery of the message to the addressee, you may not copy or deliver this > message or its attachments to anyone. Rather, you should permanently delete > this message and its attachments and kindly notify the sender by reply > e-mail. Any content of this message and its attachments which does not > relate to the official business of the sending company must be taken not to > have been sent or endorsed by that company or any of its related entities. > No warranty is made that the e-mail or attachments are free from computer > virus or other defect.