Thanks Xuefu! -- Lefty
On Mon, Nov 23, 2015 at 1:09 AM, Xuefu Zhang <xzh...@cloudera.com> wrote: > Hive is supposed to work with any version of Hive (1.1+) and a version of > Spark w/o Hive. Thus, to make HoS work reliably and also simply the > matters, I think it still makes to require that spark-assembly jar > shouldn't contain Hive Jars. Otherwise, you have to make sure that your > Hive version matches the same as the "other" Hive version that's included > in Spark. > > In CDH 5.x, Spark version is 1.5, and we still build Spark jar w/o Hive. > > Therefore, I don't see a need to update the doc. > > --Xuefu > > On Sun, Nov 22, 2015 at 9:23 PM, Lefty Leverenz <leftylever...@gmail.com> > wrote: > >> Gopal, can you confirm the doc change that Jone Zhang suggests? The >> second sentence confuses me: "You can choose Spark1.5.0+ which build >> include the Hive jars." >> >> Thanks. >> >> -- Lefty >> >> >> On Thu, Nov 19, 2015 at 8:33 PM, Jone Zhang <joyoungzh...@gmail.com> >> wrote: >> >>> I should add that Spark1.5.0+ is used hive1.2.1 default when you use >>> -Phive >>> >>> So this page >>> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started> >>> shoule >>> write like below >>> “Note that you must have a version of Spark which does *not* include >>> the Hive jars if you use Spark1.4.1 and before, You can choose >>> Spark1.5.0+ which build include the Hive jars ” >>> >>> >>> 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>: >>> >>>> >>>> >>>> > I wanted to know why is it necessary to remove the Hive jars from the >>>> >Spark build as mentioned on this >>>> >>>> Because SparkSQL was originally based on Hive & still uses Hive AST to >>>> parse SQL. >>>> >>>> The org.apache.spark.sql.hive package contains the parser which has >>>> hard-references to the hive's internal AST, which is unfortunately >>>> auto-generated code (HiveParser.TOK_TABNAME etc). >>>> >>>> Everytime Hive makes a release, those constants change in value and that >>>> is private API because of the lack of backwards-compat, which is >>>> violated >>>> by SparkSQL. >>>> >>>> So Hive-on-Spark forces mismatched versions of Hive classes, because >>>> it's >>>> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic >>>> laws of causality. >>>> >>>> Spark cannot depend on a version of Hive that is unreleased and >>>> Hive-on-Spark release cannot depend on a version of Spark that is >>>> unreleased. >>>> >>>> Cheers, >>>> Gopal >>>> >>>> >>>> >>> >> >