Hi,

We are using Spark on yarn and after referring the below URL, we have been
able to submit the jobs to yarn from remote machine (i.e. PredictionIO
Server).

http://theckang.com/2015/remote-spark-jobs-on-yarn/

1. copied core-site.xml and yarn-site.xml from Yarn cluster onto remote
machine (i.e. PredictionIO Server)
2. set the HADOOP_CONF_DIR environment variable in spark-env.sh (locally
installed copy) on the remote machine to locate the files core-site.xml and
yarn-site.xml


Now, when I am trying to train the model using the below command, I get a
new error.

pio train -- --master yarn-cluster

*Console Error Logs:*
java.io.FileNotFoundException: File does not exist:
hdfs://ip-172-31-45-33.us-west-2.compute.internal:8020/user/root/.sparkStaging/application_1476763882145_0069/hbase-site.xml

*Yarn Error Logs:*
[ERROR] [CreateWorkflow$] Error reading from file: File
file:/home/user/PredictionIO/SimilarProductRecommendation/engine.json does
not exist. Aborting workflow


I also tried to pass the file path, but no luck.

pio train -- --master yarn-cluster  --files
file:/home/user/PredictionIO/SimilarProductRecommendation/engine.json


Thanks,
Amal

Reply via email to