That is not correct, IMHO. If I am not wrong, Spark will still load data in
executor, by running some stats on the data itself to identify
partitions....

On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 <guopengfei19...@126.com> wrote:

>
> > 在 2017年10月4日,上午2:08,Nicolas Paris <nipari...@gmail.com> 写道:
> >
> > Hi
> >
> > I wonder the differences accessing HIVE tables in two different ways:
> > - with jdbc access
> > - with sparkContext
> >
> > I would say that jdbc is better since it uses HIVE that is based on
> > map-reduce / TEZ and then works on disk.
> > Using spark rdd can lead to memory errors on very huge datasets.
> >
> >
> > Anybody knows or can point me to relevant documentation ?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
> The jdbc will load data into the driver node, this may slow down the
> speed,and may OOM.
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to