Hi, I usually run Hive 2 on Spark 1..3.1 engine (as opposed using the default MR or TEZ). I tried to make Hive 2 work with TEZ 0.82 but that did not do much.
Anyway I will try to make it work. Today I compiled Spark 1.6.1 from source excluding the Hadoop libraries. I did this one before for Spark 1.3.1 engine. I created spark-assembly-1.6.1-hadoop2.4.0.jar file and followed the process that works for Spark 1.3.1. This is example with Hive 2 on Spark 1.3.1 Starting Spark Job = 0 Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2016-05-21 22:53:45,512 Stage-0_0: 1(+1)/22 Stage-1_0: 0/1 2016-05-21 22:53:47,517 Stage-0_0: 2(+1)/22 Stage-1_0: 0/1 However, when I use Spark 1.6.1 assembly file I got the following error hive> select count(1) from sales_staging; Query ID = hduser_20160521224219_dc9aae02-92bd-4279-87e2-98a6458db783 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:991) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:419) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:205) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:145) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:117) at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.execute(LocalHiveSparkClient.java:130) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:64) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:112) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1840) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1584) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1361) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Could not initialize class org.apache.spark.rdd.RDDOperationScope$ I am not sure anyone ahs tried this? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com