Re: Running SparkSql against Hive tables

Cheng Lian Sun, 07 Jun 2015 06:40:25 -0700


On 6/6/15 9:06 AM, James Pirz wrote:

I am pretty new to Spark, and using Spark 1.3.1, I am trying to use'Spark SQL' to run some SQL scripts, on the cluster. I realized thatfor a better performance, it is a good idea to use Parquet files. Ihave 2 questions regarding that:
1) If I wanna use Spark SQL against *partitioned & bucketed* tableswith Parquet format in Hive, does the provided spark binary on theapache website support that or do I need to build a new spark binarywith some additional flags ? (I found a note<https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables> inthe documentation about enabling Hive support, but I could not fullyget it as what the correct way of building is, if I need to build)

Yes, Hive support is enabled by default now for the binaries on thewebsite. However, currently Spark SQL doesn't support buckets yet.

2) Does running Spark SQL against tables in Hive downgrade theperformance, and it is better that I load parquet files directly toHDFS or having Hive in the picture is harmless ?

If you're using Parquet, then it should be fine since by default SparkSQL uses its own native Parquet support to read Parquet Hive tables.


Thnx

Re: Running SparkSql against Hive tables

Reply via email to