RE: Can't have Hive running with Spark

stephane.davy Tue, 28 Nov 2017 09:22:02 -0800

So, I think I’ve made some progress, but it is still not working.

-          I’ve fixed the RPC issue by putting my hive-site.xml file on all 
spark nodes in the spark/conf directory


-          I’ve downgraded to Spark 2.0.2

-          But I’m getting this error in Hive server logs:


Query Hive on Spark job[0] stages: [0, 1]

Status: Running (Hive on Spark job[0])
2017-11-28T10:23:12,064  INFO [HiveServer2-Background-Pool: Thread-85] 
SessionState:
Query Hive on Spark job[0] stages: [0, 1]
2017-11-28T10:23:12,064  INFO [HiveServer2-Background-Pool: Thread-85] 
SessionState:
Status: Running (Hive on Spark job[0])
2017-11-28T10:23:12,064  INFO [HiveServer2-Background-Pool: Thread-85] 
SessionState: Job Progress Format
CurrentTime StageId_StageAttemptId: 
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--------------------------------------------------------------------------------------
Stage-0                  0       RUNNING     60          0       60        0    
   0
Stage-1                  0       PENDING      1          0        0        1    
   0
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--------------------------------------------------------------------------------------
Stage-0                  0       RUNNING     60          0       59        1    
  28
Stage-1                  0       PENDING      1          0        0        1    
   0  te: Updating thread name to 999d42f8-8b89-4659-9674-2298863915b2 
HiveServer2-Handler-Pool: Thread-66
--------------------------------------------------------------------------------------
STAGES: 00/02    [>>--------------------------] 0%    ELAPSED TIME: 4.05 s
--------------------------------------------------------------------------------------
2017-11-28T10:23:14,092  INFO [HiveServer2-Background-Pool: Thread-85] 
SessionState: 2017-11-28 10:23:14,091    Stage-0_0: 0(+59,-28)/60        
Stage-1_0: 0/1
2017-11-28T10:23:14,569  INFO [RPC-Handler-3] client.SparkClientImpl: Received 
result for 43d316a4-f785-41e6-93e8-43495ae509b8
2017-11-28T10:23:14,886  INFO [HiveServer2-Handler-Pool: Thread-66] 
session.SessionState: Updating thread name to 
999d42f8-8b89-4659-9674-2298863915b2 HiveServer2-Handler-Pool: Thread-66
2017-11-28T10:23:14,886  INFO [HiveServer2-Handler-Pool: Thread-66] 
session.SessionState: Resetting thread name to  HiveServer2-Handler-Pool: 
Thread-66
Job failed with java.lang.NullPointerException
2017-11-28T10:23:15,092 ERROR [HiveServer2-Background-Pool: Thread-85] 
SessionState: Job failed with java.lang.NullPointerException
java.util.concurrent.ExecutionException: Exception thrown by job
        at 
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
        at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
        at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 37 in stage 0.0 failed 4 times, most recent failure: Lost task 37.3 in 
stage 0.0 (TID 177, 22.0.87.35): java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:350)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:678)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:245)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)

… => should I put all the stack trace?

No error on Spark side, it seems that it is waiting for more input from Hive 
side

But now, I don’t know where and how should I go further

Stéphane


From: Sahil Takiar [mailto:[email protected]]
Sent: Monday, November 27, 2017 18:20
To: [email protected]
Subject: Re: Can't have Hive running with Spark

Right now we only support Spark 2.0.0, the issue you are facing is probably due 
to a version mismatch issue.

You may find this JIRA useful - SPARK-16292 - you want to make sure you are 
building the Spark distribution correctly.

On Mon, Nov 27, 2017 at 8:48 AM, 
<[email protected]<mailto:[email protected]>> wrote:
Hello all

I’m trying to have Hive running on top of a Spark cluster.

-          Hive version: 2.3.2, installed with the embedded derby database 
(local mode)

-          Spark version: 2.2.0, installed in cluster mode, no yarn, no mesos

-          Hadoop version: 2.7.4

-          OS: Redhat 7

There is something special here, I don’t run it on the top of Hadoop, but on 
the top of Elasticsearch thanks to the Elastic-hadoop bridge. The reason why 
I’m using derby and basic cluster mode for Spark is that I’m currently in a 
kind of discovery phase.
What is working nicely:

-          Spark on ES: I can submit python scripts to query my Elastic db

-          Hive on ES: it works with engine=mr, I’d like to have it with 
engine=spark

What I can see is that when I launch my Hive query, it seems first normal from 
the Hiveserver point of view:

2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl: {
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl:   
"action" : "CreateSubmissionResponse",
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl:   
"message" : "Driver successfully submitted as driver-20171127164308-0002",
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl:   
"serverSparkVersion" : "2.2.0",
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl:   
"submissionId" : "driver-20171127164308-0002",
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl:   
"success" : true
2017-11-27T16:43:08,808  INFO [stderr-redir-1] client.SparkClientImpl: }

But actually, on Spark side, I get the following error :

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
        at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS

I’ve set the hive.spark.client.rpc.server.address property on all the Spark 
nodes, where I’ve installed also the Hive binaries and pushed the 
hive-site.xml. I’ve set the HIVE_CONF_DIR and the HIVE_HOME on all nodes also, 
but it doesn’t work.

I’m a little bit lost now, I don’t see what I could do else ☹

Your help is appreciated.

Thanks a lot,

Stéphane


_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.



--
Sahil Takiar
Software Engineer
[email protected]<mailto:[email protected]> | (510) 673-0309

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

RE: Can't have Hive running with Spark

Reply via email to