I thought the fix had been pushed to the apache master ref. commit "[SPARK-2848] Shade Guava in uber-jars" By Marcelo Vanzin on 8/20. So my previous email was based on own build of the apache master, which turned out not working yet.
Marcelo: Please correct me if I got that commit wrong. Thanks, Du On 8/22/14, 11:41 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote: >SPARK-2420 is fixed. I don't think it will be in 1.1, though - might >be too risky at this point. > >I'm not familiar with spark-sql. > >On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee <alee...@hotmail.com> wrote: >> Hopefully there could be some progress on SPARK-2420. It looks like >>shading >> may be the voted solution among downgrading. >> >> Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark >> 1.1.2? >> >> By the way, regarding bin/spark-sql? Is this more of a debugging tool >>for >> Spark job integrating with Hive? >> How does people use spark-sql? I'm trying to understand the rationale >>and >> motivation behind this script, any idea? >> >> >>> Date: Thu, 21 Aug 2014 16:31:08 -0700 >> >>> Subject: Re: Hive From Spark >>> From: van...@cloudera.com >>> To: l...@yahoo-inc.com.invalid >>> CC: user@spark.apache.org; u...@spark.incubator.apache.org; >>> pwend...@gmail.com >> >>> >>> Hi Du, >>> >>> I don't believe the Guava change has made it to the 1.1 branch. The >>> Guava doc says "hashInt" was added in 12.0, so what's probably >>> happening is that you have and old version of Guava in your classpath >>> before the Spark jars. (Hadoop ships with Guava 11, so that may be the >>> source of your problem.) >>> >>> On Thu, Aug 21, 2014 at 4:23 PM, Du Li <l...@yahoo-inc.com.invalid> >>>wrote: >>> > Hi, >>> > >>> > This guava dependency conflict problem should have been fixed as of >>> > yesterday according to >>>https://issues.apache.org/jira/browse/SPARK-2420 >>> > >>> > However, I just got java.lang.NoSuchMethodError: >>> > >>> > >>>com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/Ha >>>shCode; >>> > by the following code snippet and ³mvn3 test² on Mac. I built the >>>latest >>> > version of spark (1.1.0-SNAPSHOT) and installed the jar files to the >>> > local >>> > maven repo. From my pom file I explicitly excluded guava from almost >>>all >>> > possible dependencies, such as spark-hive_2.10-1.1.0.SNAPSHOT, and >>> > hadoop-client. This snippet is abstracted from a larger project. So >>>the >>> > pom.xml includes many dependencies although not all are required by >>>this >>> > snippet. The pom.xml is attached. >>> > >>> > Anybody knows what to fix it? >>> > >>> > Thanks, >>> > Du >>> > ------- >>> > >>> > package com.myself.test >>> > >>> > import org.scalatest._ >>> > import org.apache.hadoop.io.{NullWritable, BytesWritable} >>> > import org.apache.spark.{SparkContext, SparkConf} >>> > import org.apache.spark.SparkContext._ >>> > >>> > class MyRecord(name: String) extends Serializable { >>> > def getWritable(): BytesWritable = { >>> > new >>> > >>>BytesWritable(Option(name).getOrElse("\\N").toString.getBytes("UTF-8")) >>> > } >>> > >>> > final override def equals(that: Any): Boolean = { >>> > if( !that.isInstanceOf[MyRecord] ) >>> > false >>> > else { >>> > val other = that.asInstanceOf[MyRecord] >>> > this.getWritable == other.getWritable >>> > } >>> > } >>> > } >>> > >>> > class MyRecordTestSuite extends FunSuite { >>> > // construct an MyRecord by Consumer.schema >>> > val rec: MyRecord = new MyRecord("James Bond") >>> > >>> > test("generated SequenceFile should be readable from spark") { >>> > val path = "./testdata/" >>> > >>> > val conf = new SparkConf(false).setMaster("local").setAppName("test >>>data >>> > exchange with Hive") >>> > conf.set("spark.driver.host", "localhost") >>> > val sc = new SparkContext(conf) >>> > val rdd = sc.makeRDD(Seq(rec)) >>> > rdd.map((x: MyRecord) => (NullWritable.get(), x.getWritable())) >>> > .saveAsSequenceFile(path) >>> > >>> > val bytes = sc.sequenceFile(path, classOf[NullWritable], >>> > classOf[BytesWritable]).first._2 >>> > assert(rec.getWritable() == bytes) >>> > >>> > sc.stop() >>> > System.clearProperty("spark.driver.port") >>> > } >>> > } >>> > >>> > >>> > From: Andrew Lee <alee...@hotmail.com> >>> > Reply-To: "user@spark.apache.org" <user@spark.apache.org> >>> > Date: Monday, July 21, 2014 at 10:27 AM >>> > To: "user@spark.apache.org" <user@spark.apache.org>, >>> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org> >>> > >>> > Subject: RE: Hive From Spark >>> > >>> > Hi All, >>> > >>> > Currently, if you are running Spark HiveContext API with Hive 0.12, >>>it >>> > won't >>> > work due to the following 2 libraries which are not consistent with >>>Hive >>> > 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a >>> > common >>> > practice, they should be consistent to work inter-operable). >>> > >>> > These are under discussion in the 2 JIRA tickets: >>> > >>> > https://issues.apache.org/jira/browse/HIVE-7387 >>> > >>> > https://issues.apache.org/jira/browse/SPARK-2420 >>> > >>> > When I ran the command by tweaking the classpath and build for Spark >>> > 1.0.1-rc3, I was able to create table through HiveContext, however, >>>when >>> > I >>> > fetch the data, due to incompatible API calls in Guava, it breaks. >>>This >>> > is >>> > critical since it needs to map the cllumns to the RDD schema. >>> > >>> > Hive and Hadoop are using an older version of guava libraries >>>(11.0.1) >>> > where >>> > Spark Hive is using guava 14.0.1+. >>> > The community isn't willing to downgrade to 11.0.1 which is the >>>current >>> > version for Hadoop 2.2 and Hive 0.12. >>> > Be aware of protobuf version as well in Hive 0.12 (it uses protobuf >>> > 2.4). >>> > >>> > scala> >>> > >>> > scala> import org.apache.spark.SparkContext >>> > import org.apache.spark.SparkContext >>> > >>> > scala> import org.apache.spark.sql.hive._ >>> > import org.apache.spark.sql.hive._ >>> > >>> > scala> >>> > >>> > scala> val hiveContext = new >>>org.apache.spark.sql.hive.HiveContext(sc) >>> > hiveContext: org.apache.spark.sql.hive.HiveContext = >>> > org.apache.spark.sql.hive.HiveContext@34bee01a >>> > >>> > scala> >>> > >>> > scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, >>>value >>> > STRING)") >>> > res0: org.apache.spark.sql.SchemaRDD = >>> > SchemaRDD[0] at RDD at SchemaRDD.scala:104 >>> > == Query Plan == >>> > <Native command: executed by Hive> >>> > >>> > scala> hiveContext.hql("LOAD DATA LOCAL INPATH >>> > 'examples/src/main/resources/kv1.txt' INTO TABLE src") >>> > res1: org.apache.spark.sql.SchemaRDD = >>> > SchemaRDD[3] at RDD at SchemaRDD.scala:104 >>> > == Query Plan == >>> > <Native command: executed by Hive> >>> > >>> > scala> >>> > >>> > scala> // Queries are expressed in HiveQL >>> > >>> > scala> hiveContext.hql("FROM src SELECT key, >>> > value").collect().foreach(println) >>> > java.lang.NoSuchMethodError: >>> > >>> > >>>com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/Ha >>>shCode; >>> > at >>> > >>> > >>>org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$colle >>>ction$OpenHashSet$$hashcode(OpenHashSet.scala:261) >>> > at >>> > >>> > >>>org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHa >>>shSet.scala:165) >>> > at >>> > >>> > >>>org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(Open >>>HashSet.scala:102) >>> > at >>> > >>> > >>>org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp( >>>SizeEstimator.scala:214) >>> > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >>> > at >>> > >>>org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) >>> > at >>> > >>> > >>>org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.sca >>>la:169) >>> > at >>> > >>> > >>>org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator >>>$$estimate(SizeEstimator.scala:161) >>> > at >>> > >>>org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >>> > at >>>org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) >>> > at >>>org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) >>> > at >>>org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) >>> > at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) >>> > at >>> > >>>org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) >>> > at >>> > >>>org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52) >>> > at >>> > >>> > >>>org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadca >>>stFactory.scala:35) >>> > at >>> > >>> > >>>org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadca >>>stFactory.scala:29) >>> > at >>> > >>> > >>>org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManage >>>r.scala:62) >>> > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) >>> > at >>> > >>>org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:60) >>> > at >>> > >>> > >>>org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.s >>>cala:70) >>> > at >>> > >>> > >>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply >>>(HiveStrategies.scala:73) >>> > at >>> > >>> > >>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply >>>(HiveStrategies.scala:73) >>> > at >>> > >>> > >>>org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLConte >>>xt.scala:280) >>> > at >>> > >>> > >>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrat >>>egies.scala:69) >>> > at >>> > >>> > >>>org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Que >>>ryPlanner.scala:58) >>> > at >>> > >>> > >>>org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Que >>>ryPlanner.scala:58) >>> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>> > at >>> > >>> > >>>org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.s >>>cala:59) >>> > at >>> > >>> > >>>org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLC >>>ontext.scala:316) >>> > at >>> > >>> > >>>org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scal >>>a:316) >>> > at >>> > >>> > >>>org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(S >>>QLContext.scala:319) >>> > at >>> > >>> > >>>org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.s >>>cala:319) >>> > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:420) >>> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >>> > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >>> > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >>> > at $iwC$$iwC$$iwC.<init>(<console>:28) >>> > at $iwC$$iwC.<init>(<console>:30) >>> > at $iwC.<init>(<console>:32) >>> > at <init>(<console>:34) >>> > at .<init>(<console>:38) >>> > at .<clinit>(<console>) >>> > at .<init>(<console>:7) >>> > at .<clinit>(<console>) >>> > at $print(<console>) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> > at >>> > >>> > >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav >>>a:57) >>> > at >>> > >>> > >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor >>>Impl.java:43) >>> > at java.lang.reflect.Method.invoke(Method.java:606) >>> > at >>> > >>>org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788 >>>) >>> > at >>> > >>> > >>>org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:105 >>>6) >>> > at >>> > >>>org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) >>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) >>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) >>> > at >>> > >>>org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) >>> > at >>> > >>> > >>>org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala: >>>841) >>> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) >>> > at >>>org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) >>> > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) >>> > at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) >>> > at >>> > >>> > >>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkIL >>>oop.scala:936) >>> > at >>> > >>> > >>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.sca >>>la:884) >>> > at >>> > >>> > >>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.sca >>>la:884) >>> > at >>> > >>> > >>>scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoa >>>der.scala:135) >>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) >>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) >>> > at org.apache.spark.repl.Main$.main(Main.scala:31) >>> > at org.apache.spark.repl.Main.main(Main.scala) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> > at >>> > >>> > >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav >>>a:57) >>> > at >>> > >>> > >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor >>>Impl.java:43) >>> > at java.lang.reflect.Method.invoke(Method.java:606) >>> > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) >>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) >>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> > >>> > >>> > >>> >> From: hao.ch...@intel.com >>> >> To: user@spark.apache.org; u...@spark.incubator.apache.org >>> >> Subject: RE: Hive From Spark >>> >> Date: Mon, 21 Jul 2014 01:14:19 +0000 >>> >> >>> >> JiaJia, I've checkout the latest 1.0 branch, and then do the >>>following >>> >> steps: >>> >> SPAKR_HIVE=true sbt/sbt clean assembly >>> >> cd examples >>> >> ../bin/run-example sql.hive.HiveFromSpark >>> >> >>> >> It works well in my local >>> >> >>> >> From your log output, it shows "Invalid method name: 'get_table', >>>seems >>> >> an >>> >> incompatible jar version or something wrong between the Hive >>>metastore >>> >> service and client, can you double check the jar versions of Hive >>> >> metastore >>> >> service or thrift? >>> >> >>> >> >>> >> -----Original Message----- >>> >> From: JiajiaJing [mailto:jj.jing0...@gmail.com] >>> >> Sent: Saturday, July 19, 2014 7:29 AM >>> >> To: u...@spark.incubator.apache.org >>> >> Subject: RE: Hive From Spark >>> >> >>> >> Hi Cheng Hao, >>> >> >>> >> Thank you very much for your reply. >>> >> >>> >> Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 . >>> >> >>> >> Some setups of the environment are done by running "SPARK_HIVE=true >>> >> sbt/sbt assembly/assembly", including the jar in all the workers, >>>and >>> >> copying the hive-site.xml to spark's conf dir. >>> >> >>> >> And then run the program as: " ./bin/run-example >>> >> org.apache.spark.examples.sql.hive.HiveFromSpark " >>> >> >>> >> It's good to know that this example runs well on your machine, could >>> >> you >>> >> please give me some insight about your have done as well? >>> >> >>> >> Thank you very much! >>> >> >>> >> Jiajia >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> View this message in context: >>> >> >>> >> >>>http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10 >>>110p10215.html >>> >> Sent from the Apache Spark User List mailing list archive at >>> >> Nabble.com. >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> > For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>> >>> -- >>> Marcelo >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> > > > >-- >Marcelo > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org