That is not correct, IMHO. If I am not wrong, Spark will still load data in executor, by running some stats on the data itself to identify partitions....
On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 <guopengfei19...@126.com> wrote: > > > 在 2017年10月4日,上午2:08,Nicolas Paris <nipari...@gmail.com> 写道: > > > > Hi > > > > I wonder the differences accessing HIVE tables in two different ways: > > - with jdbc access > > - with sparkContext > > > > I would say that jdbc is better since it uses HIVE that is based on > > map-reduce / TEZ and then works on disk. > > Using spark rdd can lead to memory errors on very huge datasets. > > > > > > Anybody knows or can point me to relevant documentation ? > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > The jdbc will load data into the driver node, this may slow down the > speed,and may OOM. > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha