Re: PIO 0.12.1 with HDP Spark on YARN

Pat Ferrel Tue, 29 May 2018 19:34:35 -0700

Yarn has to be started explicitly. Usually it is part of Hadoop and is
started with Hadoop. Spark only contains the client for Yarn (afaik).




From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com>
<clifford.mil...@phoenix-opsgroup.com>
Reply: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Date: May 29, 2018 at 6:45:43 PM
To: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Subject:  Re: PIO 0.12.1 with HDP Spark on YARN

That's the command that I'm using but it gives me the exception that I
listed in the previous email.  I've installed a Spark standalone cluster
and am using that for training for now but would like to use Spark on YARN
eventually.

Are you using HDP? If so, what version of HDP are you using?  I'm using
*HDP-2.6.2.14.*



On Tue, May 29, 2018 at 8:55 PM, suyash kharade <suyash.khar...@gmail.com>
wrote:

> I use 'pio train -- --master yarn'
> It works for me to train universal recommender
>
> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
> clifford.mil...@phoenix-opsgroup.com> wrote:
>
>> To add more details to this.  When I attempt to execute my training job
>> using the command 'pio train -- --master yarn' I get the exception that
>> I've included below.  Can anyone tell me how to correctly submit the
>> training job or what setting I need to change to make this work.  I've made
>> not custom code changes and am simply using PIO 0.12.1 with the
>> SimilarProduct Recommender.
>>
>>
>>
>> [ERROR] [SparkContext] Error initializing SparkContext.
>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request
>> executors before the AM has registered!
>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>         at scala.collection.IndexedSeqOptimized$class.foreach(
>> IndexedSeqOptimized.scala:33)
>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.
>> scala:186)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>> InputString(YarnSparkHadoopUtil.scala:152)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:819)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:817)
>>         at scala.Option.foreach(Option.scala:257)
>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>> ala:817)
>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>> text(Client.scala:911)
>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>> .scala:172)
>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>> d.start(YarnClientSchedulerBackend.scala:56)
>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>> ulerImpl.scala:156)
>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(
>> WorkflowContext.scala:45)
>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>> CoreWorkflow.scala:59)
>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>> Workflow.scala:251)
>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>> orkflow.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:187)
>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:212)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 126)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>>
>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>> clifford.mil...@phoenix-opsgroup.com> wrote:
>>
>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>> detection problem but I'm still not able to submit Spark jobs unless they
>>> are strictly local.  How are you submitting to the HDP Spark?
>>>
>>> Thanks,
>>>
>>> --Cliff.
>>>
>>>
>>>
>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>> suyash.khar...@gmail.com> wrote:
>>>
>>>> Hi Miller,
>>>>     I faced same issue.
>>>>     It is giving error as release file has '-' in version
>>>>     Insert simple version in release file something like 2.6.
>>>>
>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>> clifford.mil...@phoenix-opsgroup.com> wrote:
>>>>
>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>> eventserver &', I get the following error.*
>>>>>
>>>>> ####
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>> 2.2.6.2.14-5: integer expression expected
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>> Aborting.
>>>>>
>>>>> ####
>>>>>
>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>> following message:*
>>>>>
>>>>> ###
>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>> are using at least 1.3.0.
>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>> cannot be used because libhadoop cannot be loaded.
>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>> ####
>>>>>
>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>> then I get the following:*
>>>>>
>>>>> #######
>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:154)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:152)
>>>>>         at scala.collection.IndexedSeqOpt
>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>         at scala.collection.mutable.Array
>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>> l.scala:152)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.submitApplication(Client.scala:172)
>>>>>         at org.apache.spark.scheduler.clu
>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>> kend.scala:56)
>>>>>         at org.apache.spark.scheduler.Tas
>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>         at org.apache.spark.SparkContext.
>>>>> <init>(SparkContext.scala:509)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>> mit.scala:751)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>> ########
>>>>>
>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>> training?*
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>> *--Cliff.*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suyash K
>>>>
>>>
>>>
>>>
>>>
>>>
>
>
> --
> Regards,
> Suyash K
>

Re: PIO 0.12.1 with HDP Spark on YARN

Reply via email to