Re: Building Spark to use for Hive on Spark

Lefty Leverenz Sun, 22 Nov 2015 22:41:25 -0800

Thanks Xuefu!

-- Lefty


On Mon, Nov 23, 2015 at 1:09 AM, Xuefu Zhang <xzh...@cloudera.com> wrote:

> Hive is supposed to work with any version of Hive (1.1+) and a version of
> Spark w/o Hive. Thus, to make HoS work reliably and also simply the
> matters, I think it still makes to require that spark-assembly jar
> shouldn't contain Hive Jars. Otherwise, you have to make sure that your
> Hive version matches the same as the "other" Hive version that's included
> in Spark.
>
> In CDH 5.x, Spark version is 1.5, and we still build Spark jar w/o Hive.
>
> Therefore, I don't see a need to update the doc.
>
> --Xuefu
>
> On Sun, Nov 22, 2015 at 9:23 PM, Lefty Leverenz <leftylever...@gmail.com>
> wrote:
>
>> Gopal, can you confirm the doc change that Jone Zhang suggests?  The
>> second sentence confuses me:  "You can choose Spark1.5.0+ which  build
>> include the Hive jars."
>>
>> Thanks.
>>
>> -- Lefty
>>
>>
>> On Thu, Nov 19, 2015 at 8:33 PM, Jone Zhang <joyoungzh...@gmail.com>
>> wrote:
>>
>>> I should add that Spark1.5.0+ is used hive1.2.1 default when you use
>>> -Phive
>>>
>>> So this page
>>> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started>
>>>  shoule
>>> write like below
>>> “Note that you must have a version of Spark which does *not* include
>>> the Hive jars if you use Spark1.4.1 and before, You can choose
>>> Spark1.5.0+ which  build include the Hive jars ”
>>>
>>>
>>> 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>:
>>>
>>>>
>>>>
>>>> > I wanted to know  why is it necessary to remove the Hive jars from the
>>>> >Spark build as mentioned on this
>>>>
>>>> Because SparkSQL was originally based on Hive & still uses Hive AST to
>>>> parse SQL.
>>>>
>>>> The org.apache.spark.sql.hive package contains the parser which has
>>>> hard-references to the hive's internal AST, which is unfortunately
>>>> auto-generated code (HiveParser.TOK_TABNAME etc).
>>>>
>>>> Everytime Hive makes a release, those constants change in value and that
>>>> is private API because of the lack of backwards-compat, which is
>>>> violated
>>>> by SparkSQL.
>>>>
>>>> So Hive-on-Spark forces mismatched versions of Hive classes, because
>>>> it's
>>>> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic
>>>> laws of causality.
>>>>
>>>> Spark cannot depend on a version of Hive that is unreleased and
>>>> Hive-on-Spark release cannot depend on a version of Spark that is
>>>> unreleased.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>
>>
>

Re: Building Spark to use for Hive on Spark

Reply via email to