So, I think I’ve made some progress, but it is still not working.
- I’ve fixed the RPC issue by putting my hive-site.xml file on all
spark nodes in the spark/conf directory
- I’ve downgraded to Spark 2.0.2
- But I’m getting this error in Hive server logs:
Query Hive on Spark job[0] stages: [0, 1]
Status: Running (Hive on Spark job[0])
2017-11-28T10:23:12,064 INFO [HiveServer2-Background-Pool: Thread-85]
SessionState:
Query Hive on Spark job[0] stages: [0, 1]
2017-11-28T10:23:12,064 INFO [HiveServer2-Background-Pool: Thread-85]
SessionState:
Status: Running (Hive on Spark job[0])
2017-11-28T10:23:12,064 INFO [HiveServer2-Background-Pool: Thread-85]
SessionState: Job Progress Format
CurrentTime StageId_StageAttemptId:
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING
FAILED
--------------------------------------------------------------------------------------
Stage-0 0 RUNNING 60 0 60 0
0
Stage-1 0 PENDING 1 0 0 1
0
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING
FAILED
--------------------------------------------------------------------------------------
Stage-0 0 RUNNING 60 0 59 1
28
Stage-1 0 PENDING 1 0 0 1
0 te: Updating thread name to 999d42f8-8b89-4659-9674-2298863915b2
HiveServer2-Handler-Pool: Thread-66
--------------------------------------------------------------------------------------
STAGES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 4.05 s
--------------------------------------------------------------------------------------
2017-11-28T10:23:14,092 INFO [HiveServer2-Background-Pool: Thread-85]
SessionState: 2017-11-28 10:23:14,091 Stage-0_0: 0(+59,-28)/60
Stage-1_0: 0/1
2017-11-28T10:23:14,569 INFO [RPC-Handler-3] client.SparkClientImpl: Received
result for 43d316a4-f785-41e6-93e8-43495ae509b8
2017-11-28T10:23:14,886 INFO [HiveServer2-Handler-Pool: Thread-66]
session.SessionState: Updating thread name to
999d42f8-8b89-4659-9674-2298863915b2 HiveServer2-Handler-Pool: Thread-66
2017-11-28T10:23:14,886 INFO [HiveServer2-Handler-Pool: Thread-66]
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool:
Thread-66
Job failed with java.lang.NullPointerException
2017-11-28T10:23:15,092 ERROR [HiveServer2-Background-Pool: Thread-85]
SessionState: Job failed with java.lang.NullPointerException
java.util.concurrent.ExecutionException: Exception thrown by job
at
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 37 in stage 0.0 failed 4 times, most recent failure: Lost task 37.3 in
stage 0.0 (TID 177, 22.0.87.35): java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:350)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:678)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:245)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
… => should I put all the stack trace?
No error on Spark side, it seems that it is waiting for more input from Hive
side
But now, I don’t know where and how should I go further
Stéphane
From: Sahil Takiar [mailto:[email protected]]
Sent: Monday, November 27, 2017 18:20
To: [email protected]
Subject: Re: Can't have Hive running with Spark
Right now we only support Spark 2.0.0, the issue you are facing is probably due
to a version mismatch issue.
You may find this JIRA useful - SPARK-16292 - you want to make sure you are
building the Spark distribution correctly.
On Mon, Nov 27, 2017 at 8:48 AM,
<[email protected]<mailto:[email protected]>> wrote:
Hello all
I’m trying to have Hive running on top of a Spark cluster.
- Hive version: 2.3.2, installed with the embedded derby database
(local mode)
- Spark version: 2.2.0, installed in cluster mode, no yarn, no mesos
- Hadoop version: 2.7.4
- OS: Redhat 7
There is something special here, I don’t run it on the top of Hadoop, but on
the top of Elasticsearch thanks to the Elastic-hadoop bridge. The reason why
I’m using derby and basic cluster mode for Spark is that I’m currently in a
kind of discovery phase.
What is working nicely:
- Spark on ES: I can submit python scripts to query my Elastic db
- Hive on ES: it works with engine=mr, I’d like to have it with
engine=spark
What I can see is that when I launch my Hive query, it seems first normal from
the Hiveserver point of view:
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl: {
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl:
"action" : "CreateSubmissionResponse",
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl:
"message" : "Driver successfully submitted as driver-20171127164308-0002",
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl:
"serverSparkVersion" : "2.2.0",
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl:
"submissionId" : "driver-20171127164308-0002",
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl:
"success" : true
2017-11-27T16:43:08,808 INFO [stderr-redir-1] client.SparkClientImpl: }
But actually, on Spark side, I get the following error :
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
I’ve set the hive.spark.client.rpc.server.address property on all the Spark
nodes, where I’ve installed also the Hive binaries and pushed the
hive-site.xml. I’ve set the HIVE_CONF_DIR and the HIVE_HOME on all nodes also,
but it doesn’t work.
I’m a little bit lost now, I don’t see what I could do else ☹
Your help is appreciated.
Thanks a lot,
Stéphane
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
--
Sahil Takiar
Software Engineer
[email protected]<mailto:[email protected]> | (510) 673-0309
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.