pls check and attach the application master log. 2015-11-02 8:03 GMT+08:00 Jagat Singh <jagatsi...@gmail.com>:
> Hi, > > I am trying to run Hive on Spark on HDP Virtual machine 2.3 > > Following wiki > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started > > I have replaced all the occurrences of hdp.version with 2.3.0.0-2557 > > I start hive with following > > set hive.execution.engine=spark; > set spark.master=yarn-client; > set spark.executor.memory=512m; > > I run the query > > select count(*) from sample_07; > > The query starts and fails with following error. > > In console > > Status: Running (Hive on Spark job[0]) > Job Progress Format > CurrentTime StageId_StageAttemptId: > SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount > [StageCost] > 2015-11-01 23:40:26,411 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 > state = FAILED > Status: Failed > FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > hive> select count(*) from sample_07; > > In the logs > > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: 2015-11-01 > 23:55:36,313 INFO - [pool-1-thread-1:] ~ Failed to run job > b8649c92-1504-43c7-8100-020b866e58da (RemoteDriver:389) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: > java.util.concurrent.ExecutionException: Exception thrown by job > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.util.concurrent.FutureTask.run(FutureTask.java:262) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.lang.Thread.run(Thread.java:745) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: Caused by: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, sandbox.hortonworks.com): java.lang.NullPointerException > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.Task.run(Task.scala:64) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > java.lang.Thread.run(Thread.java:745) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: Driver > stacktrace: > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > scala.Option.foreach(Option.scala:236) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded > message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) > 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded > message of type org.apache.hive.spark.client.BaseProtocol$JobResult (3851 > bytes) > 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.RpcDispatcher: [ClientProtocol] > Received RPC message: type=CALL id=2 > payload=org.apache.hive.spark.client.BaseProtocol$JobResult > 15/11/01 23:55:36 [RPC-Handler-3]: INFO client.SparkClientImpl: Received > result for b8649c92-1504-43c7-8100-020b866e58da > 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded > message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) > 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded > message of type org.apache.hive.spark.client.rpc.Rpc$NullMessage (2 bytes) > state = FAILED > 15/11/01 23:55:36 [main]: INFO status.SparkJobMonitor: state = FAILED > Status: Failed > 15/11/01 23:55:36 [main]: ERROR status.SparkJobMonitor: Status: Failed > > > In Resource manager i get succeeded > > [image: Inline image 1] > > > > How to debug this ? > > Thanks, > > > > >