Hi Nicolas, without the hive thrift server, if you try to run a select * on a table which has around 10,000 partitions, SPARK will give you some surprises. PRESTO works fine in these scenarios, and I am sure SPARK community will soon learn from their algorithms.
Regards, Gourav On Sun, Oct 15, 2017 at 3:43 PM, Nicolas Paris <nipari...@gmail.com> wrote: > > I do not think that SPARK will automatically determine the partitions. > Actually > > it does not automatically determine the partitions. In case a table has > a few > > million records, it all goes through the driver. > > Hi Gourav > > Actualy spark jdbc driver is able to deal direclty with partitions. > Sparks creates a jdbc connection for each partition. > > All details explained in this post : > http://www.gatorsmile.io/numpartitionsinjdbc/ > > Also an example with greenplum database: > http://engineering.pivotal.io/post/getting-started-with-greenplum-spark/ >