I should add that Spark1.5.0+ is used hive1.2.1 default when you use -Phive
So this page <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started> shoule write like below “Note that you must have a version of Spark which does *not* include the Hive jars if you use Spark1.4.1 and before, You can choose Spark1.5.0+ which build include the Hive jars ” 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>: > > > > I wanted to know why is it necessary to remove the Hive jars from the > >Spark build as mentioned on this > > Because SparkSQL was originally based on Hive & still uses Hive AST to > parse SQL. > > The org.apache.spark.sql.hive package contains the parser which has > hard-references to the hive's internal AST, which is unfortunately > auto-generated code (HiveParser.TOK_TABNAME etc). > > Everytime Hive makes a release, those constants change in value and that > is private API because of the lack of backwards-compat, which is violated > by SparkSQL. > > So Hive-on-Spark forces mismatched versions of Hive classes, because it's > a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic > laws of causality. > > Spark cannot depend on a version of Hive that is unreleased and > Hive-on-Spark release cannot depend on a version of Spark that is > unreleased. > > Cheers, > Gopal > > >