Re: Hive From Spark: Jdbc VS sparkContext

Gourav Sengupta Sun, 15 Oct 2017 10:24:53 -0700

Hi Nicolas,

without the hive thrift server, if you try to run a select * on a table
which has around 10,000 partitions, SPARK will give you some surprises.
PRESTO works fine in these scenarios, and I am sure SPARK community will
soon learn from their algorithms.



Regards,
Gourav

On Sun, Oct 15, 2017 at 3:43 PM, Nicolas Paris <nipari...@gmail.com> wrote:

> > I do not think that SPARK will automatically determine the partitions.
> Actually
> > it does not automatically determine the partitions. In case a table has
> a few
> > million records, it all goes through the driver.
>
> Hi Gourav
>
> Actualy spark jdbc driver is able to deal direclty with partitions.
> Sparks creates a jdbc connection for each partition.
>
> All details explained in this post :
> http://www.gatorsmile.io/numpartitionsinjdbc/
>
> Also an example with greenplum database:
> http://engineering.pivotal.io/post/getting-started-with-greenplum-spark/
>

Re: Hive From Spark: Jdbc VS sparkContext

Reply via email to