Re: Building Spark to use for Hive on Spark

Jone Zhang Thu, 19 Nov 2015 17:34:35 -0800

I should add that Spark1.5.0+ is used hive1.2.1 default when you use -Phive


So this page
<https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started>
shoule
write like below
“Note that you must have a version of Spark which does *not* include the
Hive jars if you use Spark1.4.1 and before, You can choose Spark1.5.0+
which  build include the Hive jars ”


2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>:

>
>
> > I wanted to know  why is it necessary to remove the Hive jars from the
> >Spark build as mentioned on this
>
> Because SparkSQL was originally based on Hive & still uses Hive AST to
> parse SQL.
>
> The org.apache.spark.sql.hive package contains the parser which has
> hard-references to the hive's internal AST, which is unfortunately
> auto-generated code (HiveParser.TOK_TABNAME etc).
>
> Everytime Hive makes a release, those constants change in value and that
> is private API because of the lack of backwards-compat, which is violated
> by SparkSQL.
>
> So Hive-on-Spark forces mismatched versions of Hive classes, because it's
> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic
> laws of causality.
>
> Spark cannot depend on a version of Hive that is unreleased and
> Hive-on-Spark release cannot depend on a version of Spark that is
> unreleased.
>
> Cheers,
> Gopal
>
>
>

Re: Building Spark to use for Hive on Spark

Reply via email to