I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and executing org.apache.hadoop.hive.ql.Driver with java application.
Following are my situations: 1.Building spark 1.4.1 assembly jar without Hive . 2.Uploading the spark assembly jar to the hadoop cluster. 3.Executing the java application with eclipse IDE in my client computer. The application went well and it submitted mr job to the yarn cluster successfully when using " hiveConf.set("hive.execution.engine", "mr") ",but it threw exceptions in spark-engine. Finally, i traced Hive source code and came to the conclusion: In my situation, SparkClientImpl class will generate the spark-submit shell and executed it. The shell command allocated --class with RemoteDriver.class.getName() and jar with SparkContext.jarOfClass(this.getClass()).get(), so that my application threw the exception. Is it right? And how can I do to execute the application with spark-engine successfully in my client computer ? Thanks a lot! Java application code: public class TestHiveDriver { private static HiveConf hiveConf; private static Driver driver; private static CliSessionState ss; public static void main(String[] args){ String sql = "select * from hadoop0263_0 as a join hadoop0263_0 as b on (a.key = b.key)"; ss = new CliSessionState(new HiveConf(SessionState.class)); hiveConf = new HiveConf(Driver.class); hiveConf.set("fs.default.name", "hdfs://storm0:9000"); hiveConf.set("yarn.resourcemanager.address", "storm0:8032"); hiveConf.set("yarn.resourcemanager.scheduler.address", "storm0:8030"); hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031"); hiveConf.set("yarn.resourcemanager.admin.address", "storm0:8033"); hiveConf.set("mapreduce.framework.name", "yarn"); hiveConf.set("mapreduce.johistory.address", "storm0:10020"); hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore"); hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver"); hiveConf.set("javax.jdo.option.ConnectionUserName", "root"); hiveConf.set("javax.jdo.option.ConnectionPassword", "123456"); hiveConf.setBoolean("hive.auto.convert.join",false); hiveConf.set("spark.yarn.jar", "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar"); hiveConf.set("spark.home","target/spark"); hiveConf.set("hive.execution.engine", "spark"); hiveConf.set("hive.dbname", "default"); driver = new Driver(hiveConf); SessionState.start(hiveConf); CommandProcessorResponse res = null; try { res = driver.run(sql); } catch (CommandNeedRetryException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println("Response Code:" + res.getResponseCode()); System.out.println("Error Message:" + res.getErrorMessage()); System.out.println("SQL State:" + res.getSQLState()); } } Exception of spark-engine: 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with argv: /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit --properties-file /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties --class org.apache.hive.spark.client.RemoteDriver /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar --remote-host MacBook-Pro.local --remote-port 51331 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client: 16/03/10 18:33:09 INFO SparkClientImpl: client token: N/A 16/03/10 18:33:09 INFO SparkClientImpl: diagnostics: N/A 16/03/10 18:33:09 INFO SparkClientImpl: ApplicationMaster host: N/A 16/03/10 18:33:09 INFO SparkClientImpl: ApplicationMaster RPC port: -1 16/03/10 18:33:09 INFO SparkClientImpl: queue: default 16/03/10 18:33:09 INFO SparkClientImpl: start time: 1457180833494 16/03/10 18:33:09 INFO SparkClientImpl: final status: UNDEFINED 16/03/10 18:33:09 INFO SparkClientImpl: tracking URL: http://storm0:8088/proxy/application_1457002628102_0043/ 16/03/10 18:33:09 INFO SparkClientImpl: user: stana 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client: Application report for application_1457002628102_0043 (state: FAILED) 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client: 16/03/10 18:33:10 INFO SparkClientImpl: client token: N/A 16/03/10 18:33:10 INFO SparkClientImpl: diagnostics: Application application_1457002628102_0043 failed 1 times due to AM Container for appattempt_1457002628102_0043_000001 exited with exitCode: -1000 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output, check application tracking page:http://storm0:8088/proxy/application_1457002628102_0043/Then, click on links to logs of each attempt. 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics: java.io.FileNotFoundException: File file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar does not exist 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing the application. 16/03/10 18:33:10 INFO SparkClientImpl: ApplicationMaster host: N/A 16/03/10 18:33:10 INFO SparkClientImpl: ApplicationMaster RPC port: -1 16/03/10 18:33:10 INFO SparkClientImpl: queue: default 16/03/10 18:33:10 INFO SparkClientImpl: start time: 1457180833494 16/03/10 18:33:10 INFO SparkClientImpl: final status: FAILED 16/03/10 18:33:10 INFO SparkClientImpl: tracking URL: http://storm0:8088/cluster/app/application_1457002628102_0043 16/03/10 18:33:10 INFO SparkClientImpl: user: stana 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main" org.apache.spark.SparkException: Application application_1457002628102_0043 finished with failed status 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.yarn.Client.run(Client.scala:920) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.yarn.Client$.main(Client.scala:966) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.yarn.Client.main(Client.scala) 16/03/10 18:33:10 INFO SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 16/03/10 18:33:10 INFO SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 16/03/10 18:33:10 INFO SparkClientImpl: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 16/03/10 18:33:10 INFO SparkClientImpl: at java.lang.reflect.Method.invoke(Method.java:606) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) 16/03/10 18:33:10 INFO SparkClientImpl: at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO ShutdownHookManager: Shutdown hook called 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO ShutdownHookManager: Deleting directory /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client to connect. java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited before connecting back at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) [hive-exec-2.0.0.jar:?] at org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41) [test-classes/:?] Caused by: java.lang.RuntimeException: Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited before connecting back at org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) ~[hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450) ~[hive-exec-2.0.0.jar:2.0.0] at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67] 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1. FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) at org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)