Re: hive on spark - why is it so hard?

Sahil Takiar Wed, 27 Sep 2017 10:15:22 -0700

You can try increasing the value of hive.spark.client.connect.timeout.
Would also suggest taking a look at the HoS Remote Driver logs. The driver
gets launched in a YARN container (assuming you are running Spark in
yarn-client mode), so you just have to find the logs for that container.


--Sahil

On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague <[email protected]> wrote:

> i _seem_ to be getting closer.  Maybe its just wishful thinking.   Here's
> where i'm at now.
>
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> 17/09/26 21:10:38 INFO rest.RestSubmissionClient: Server responded with
> CreateSubmissionResponse:
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: {
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> "action" : "CreateSubmissionResponse",
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> "message" : "Driver successfully submitted as driver-20170926211038-0003",
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> "serverSparkVersion" : "2.2.0",
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> "submissionId" : "driver-20170926211038-0003",
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
> "success" : true
> 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: }
> 2017-09-26T21:10:45,701 DEBUG [IPC Client (425015667) connection to
> dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC
> Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.
> 19.73.136:8020 from dwr: closed
> 2017-09-26T21:10:45,702 DEBUG [IPC Client (425015667) connection to
> dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC
> Clien
> t (425015667) connection to dwrdevnn1.sv2.trulia.com/172.19.73.136:8020
> from dwr: stopped, remaining connections 0
> 2017-09-26T21:12:06,719 ERROR [2337b36e-86ca-47cd-b1ae-f0b32571b97e main]
> client.SparkClientImpl: Timed out waiting for client to connect.
> *Possible reasons include network issues, errors in remote driver or the
> cluster has no available resources, etc.*
> *Please check YARN or Spark driver's logs for further information.*
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException:
> Timed out waiting for client connection.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
>         at 
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:108)
> [hive-exec-2.3.0.jar:2.3.0]
>         at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.
> createRemoteClient(RemoteHiveSparkClient.java:101)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.
> RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:97)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.
> createHiveSparkClient(HiveSparkClientFactory.java:73)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.session.
> SparkSessionImpl.open(SparkSessionImpl.java:62)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.session.
> SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
> [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.
> getSparkSession(SparkUtilities.java:126) [hive-exec-2.3.0.jar:2.3.0]
>         at org.apache.hadoop.hive.ql.optimizer.spark.
> SetSparkReducerParallelism.getSparkMemoryAndCores(
> SetSparkReducerParallelism.java:236) [hive-exec-2.3.0.jar:2.3.0]
>
>
> i'll dig some more tomorrow.
>
> On Tue, Sep 26, 2017 at 8:23 PM, Stephen Sprague <[email protected]>
> wrote:
>
>> oh. i missed Gopal's reply.  oy... that sounds foreboding.  I'll keep you
>> posted on my progress.
>>
>> On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
>>> spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed
>>> to create spark client.
>>>
>>> I get inexplicable errors with Hive-on-Spark unless I do a three step
>>> build.
>>>
>>> Build Hive first, use that version to build Spark, use that Spark
>>> version to rebuild Hive.
>>>
>>> I have to do this to make it work because Spark contains Hive jars and
>>> Hive contains Spark jars in the class-path.
>>>
>>> And specifically I have to edit the pom.xml files, instead of passing in
>>> params with -Dspark.version, because the installed pom files don't get
>>> replacements from the build args.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>
>


-- 
Sahil Takiar
Software Engineer at Cloudera
[email protected] | (510) 673-0309

Re: hive on spark - why is it so hard?

Reply via email to