Re: PIO 0.12.1 with HDP Spark on YARN

suyash kharade Wed, 30 May 2018 07:25:43 -0700

I am using hdp 2.6.4

On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
clifford.mil...@phoenix-opsgroup.com> wrote:


> That's the command that I'm using but it gives me the exception that I
> listed in the previous email.  I've installed a Spark standalone cluster
> and am using that for training for now but would like to use Spark on YARN
> eventually.
>
> Are you using HDP? If so, what version of HDP are you using?  I'm using
> *HDP-2.6.2.14.*
>
>
>
> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <suyash.khar...@gmail.com>
> wrote:
>
>> I use 'pio train -- --master yarn'
>> It works for me to train universal recommender
>>
>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>> clifford.mil...@phoenix-opsgroup.com> wrote:
>>
>>> To add more details to this.  When I attempt to execute my training job
>>> using the command 'pio train -- --master yarn' I get the exception that
>>> I've included below.  Can anyone tell me how to correctly submit the
>>> training job or what setting I need to change to make this work.  I've made
>>> not custom code changes and am simply using PIO 0.12.1 with the
>>> SimilarProduct Recommender.
>>>
>>>
>>>
>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>> request executors before the AM has registered!
>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>> qOptimized.scala:33)
>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>> la:186)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:819)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:817)
>>>         at scala.Option.foreach(Option.scala:257)
>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>> ala:817)
>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>> text(Client.scala:911)
>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>> .scala:172)
>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>> ulerImpl.scala:156)
>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>> flowContext.scala:45)
>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>> Workflow.scala:59)
>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>> Workflow.scala:251)
>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>> orkflow.scala)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:187)
>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>> a:212)
>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>> 126)
>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>>
>>>
>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>> clifford.mil...@phoenix-opsgroup.com> wrote:
>>>
>>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>>> detection problem but I'm still not able to submit Spark jobs unless they
>>>> are strictly local.  How are you submitting to the HDP Spark?
>>>>
>>>> Thanks,
>>>>
>>>> --Cliff.
>>>>
>>>>
>>>>
>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>> suyash.khar...@gmail.com> wrote:
>>>>
>>>>> Hi Miller,
>>>>>     I faced same issue.
>>>>>     It is giving error as release file has '-' in version
>>>>>     Insert simple version in release file something like 2.6.
>>>>>
>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>> clifford.mil...@phoenix-opsgroup.com> wrote:
>>>>>
>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>> eventserver &', I get the following error.*
>>>>>>
>>>>>> ####
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>>> Aborting.
>>>>>>
>>>>>> ####
>>>>>>
>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>>> following message:*
>>>>>>
>>>>>> ###
>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>>> are using at least 1.3.0.
>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>> ####
>>>>>>
>>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>>> then I get the following:*
>>>>>>
>>>>>> #######
>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>> 1
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>         at scala.collection.mutable.Array
>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>> l.scala:152)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>         at org.apache.spark.scheduler.clu
>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>> kend.scala:56)
>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>         at org.apache.spark.SparkContext.
>>>>>> <init>(SparkContext.scala:509)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>> mit.scala:751)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>
>>>>>> ########
>>>>>>
>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>> training?*
>>>>>>
>>>>>> *Thanks,*
>>>>>>
>>>>>> *--Cliff.*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Suyash K
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>
>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Reply via email to