ok.. getting further. seems now i have to deploy hive to all nodes in the cluster - don't think i had to do that before but not a big deal to do it now.
for me: HIVE_HOME=/usr/lib/apache-hive-2.3.0-bin/ SPARK_HOME=/usr/lib/spark-2.2.0-bin-hadoop2.6 on all three nodes now. i started spark master on the namenode and i started spark slaves (2) on two datanodes of the cluster. so far so good. now i run my usual test command. $ hive --hiveconf hive.root.logger=DEBUG,console -e 'set hive.execution.engine=spark; select date_key, count(*) from fe_inventory.merged_properties_hist group by 1 order by 1;' i get a little further now and find the stderr from the Spark Web UI interface (nice) and it reports this: 17/09/27 20:47:35 INFO WorkerWatcher: Successfully connected to spark://Worker@172.19.79.127:40145 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)*Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS* at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:47) at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:134) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516) ... 6 more searching around the internet i find this is probably a compatibility issue. i know. i know. no surprise here. so i guess i just got to the point where everybody else is... build spark w/o hive. lemme see what happens next. On Wed, Sep 27, 2017 at 7:41 PM, Stephen Sprague <sprag...@gmail.com> wrote: > thanks. I haven't had a chance to dig into this again today but i do > appreciate the pointer. I'll keep you posted. > > On Wed, Sep 27, 2017 at 10:14 AM, Sahil Takiar <takiar.sa...@gmail.com> > wrote: > >> You can try increasing the value of hive.spark.client.connect.timeout. >> Would also suggest taking a look at the HoS Remote Driver logs. The driver >> gets launched in a YARN container (assuming you are running Spark in >> yarn-client mode), so you just have to find the logs for that container. >> >> --Sahil >> >> On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague <sprag...@gmail.com> >> wrote: >> >>> i _seem_ to be getting closer. Maybe its just wishful thinking. >>> Here's where i'm at now. >>> >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> 17/09/26 21:10:38 INFO rest.RestSubmissionClient: Server responded with >>> CreateSubmissionResponse: >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: { >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> "action" : "CreateSubmissionResponse", >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> "message" : "Driver successfully submitted as driver-20170926211038-0003", >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> "serverSparkVersion" : "2.2.0", >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> "submissionId" : "driver-20170926211038-0003", >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: >>> "success" : true >>> 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: } >>> 2017-09-26T21:10:45,701 DEBUG [IPC Client (425015667) connection to >>> dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC >>> Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.1 >>> 9.73.136:8020 from dwr: closed >>> 2017-09-26T21:10:45,702 DEBUG [IPC Client (425015667) connection to >>> dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC >>> Clien >>> t (425015667) connection to dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 >>> from dwr: stopped, remaining connections 0 >>> 2017-09-26T21:12:06,719 ERROR [2337b36e-86ca-47cd-b1ae-f0b32571b97e >>> main] client.SparkClientImpl: Timed out waiting for client to connect. >>> *Possible reasons include network issues, errors in remote driver or the >>> cluster has no available resources, etc.* >>> *Please check YARN or Spark driver's logs for further information.* >>> java.util.concurrent.ExecutionException: >>> java.util.concurrent.TimeoutException: >>> Timed out waiting for client connection. >>> at >>> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) >>> ~[netty-all-4.0.29.Final.jar:4.0.29.Final] >>> at >>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:108) >>> [hive-exec-2.3.0.jar:2.3.0] >>> at >>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) >>> [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.c >>> reateRemoteClient(RemoteHiveSparkClient.java:101) >>> [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.< >>> init>(RemoteHiveSparkClient.java:97) [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory. >>> createHiveSparkClient(HiveSparkClientFactory.java:73) >>> [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImp >>> l.open(SparkSessionImpl.java:62) [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionMan >>> agerImpl.getSession(SparkSessionManagerImpl.java:115) >>> [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSpark >>> Session(SparkUtilities.java:126) [hive-exec-2.3.0.jar:2.3.0] >>> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerPar >>> allelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236) >>> [hive-exec-2.3.0.jar:2.3.0] >>> >>> >>> i'll dig some more tomorrow. >>> >>> On Tue, Sep 26, 2017 at 8:23 PM, Stephen Sprague <sprag...@gmail.com> >>> wrote: >>> >>>> oh. i missed Gopal's reply. oy... that sounds foreboding. I'll keep >>>> you posted on my progress. >>>> >>>> On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan < >>>> gop...@apache.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a >>>>> spark session: org.apache.hadoop.hive.ql.metadata.HiveException: >>>>> Failed to create spark client. >>>>> >>>>> I get inexplicable errors with Hive-on-Spark unless I do a three step >>>>> build. >>>>> >>>>> Build Hive first, use that version to build Spark, use that Spark >>>>> version to rebuild Hive. >>>>> >>>>> I have to do this to make it work because Spark contains Hive jars and >>>>> Hive contains Spark jars in the class-path. >>>>> >>>>> And specifically I have to edit the pom.xml files, instead of passing >>>>> in params with -Dspark.version, because the installed pom files don't get >>>>> replacements from the build args. >>>>> >>>>> Cheers, >>>>> Gopal >>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Sahil Takiar >> Software Engineer at Cloudera >> takiar.sa...@gmail.com | (510) 673-0309 >> > >