PIO 0.12.1 with HDP Spark on YARN

Miller, Clifford Sun, 27 May 2018 16:03:05 -0700

*I've installed an HDP cluster with Hbase and Spark with YARN.  As part of
that installation I created some HDP (Ambari) managed clients.  I installed
PIO on one of these clients and configured PIO to use the HDP installed
Hadoop, HBase, and Spark.  When I run the command 'pio eventserver &', I
get the following error.*


####
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [: 2.2.6.2.14-5:
integer expression expected
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[: 2.2.6.2.14-5:
syntax error: invalid arithmetic operator (error token is ".2.6.2.14-5")
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[: 2.2.6.2.14-5:
syntax error: invalid arithmetic operator (error token is ".2.6.2.14-5")
You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/ which
does not meet the minimum version requirement of 1.3.0.
Aborting.

####

*If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE with
an empty file, I can then start the Eventserver, which gives me the
following message:*

###
/usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a known
problem with certain vendors (e.g. Cloudera). Please make sure you are
using at least 1.3.0.
[INFO] [Management$] Creating Event Server at 0.0.0.0:7070
[WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
[INFO] [HttpListener] Bound to /0.0.0.0:7070
[INFO] [EventServerActor] Bound received. EventServer is ready.
####

*I can then send events to the Eventserver.  After sending the events
listed in the SimilarProduct Recommender example I am unable to train.
Using the cluster.  If I use 'pio train' then it successfully trains
locally.  If I atttempt to use the command "pio train -- --master yarn"
then I get the following:*

#######
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
        at scala.Option.foreach(Option.scala:257)
        at
org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:817)
        at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:911)
        at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172)
        at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at
org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at
org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at
org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
        at
org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

########

*What is the correct way to get PIO to use the YARN based Spark for
training?*

*Thanks,*

*--Cliff.*

PIO 0.12.1 with HDP Spark on YARN

Reply via email to