Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext
Well there is also a third way to access the hive data from spark:
- with direct file access (here ORC format)
For example:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
val people = sqlContext.read.format("orc").load("hdfs://cluster//orc_people")
people.createOrReplaceTempView("people")
sqlContext.sql("SELECT count(1) FROM people WHERE ...").show()
This method looks much faster than both:
- with jdbc access
- with sparkContext
Any experience on that ?
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]