[ https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649062#comment-14649062 ]
Min Wu commented on SPARK-8646: ------------------------------- Hi, I got same issue when I running the pyspark program with yarn-client mode and spark 1.4.1 from Biginsight 4.1(Ambari). Because the assembly jar no longer contains the python scripts of pyspark and py4j, so I set the spark home via SparkContext.setSparkHome() to spark-client location(because this is one Ambari hadoop, so the spark-client contains the python folder, and it includes the py4j and pyspark scripts). The API document shows this will be applied to slave nodes, I assume this can be applied for "spark on yarn" also, but it does not work. The worker nodes always get the PYTHONPATH from cached assembly jar. After checked the SparkContext code, seems the sparkHome will be set into SparkConf as "spark.home", so I think maybe it should be distributed to all executor and pyspark can use this parameter to locate the PYTHONPATH also. > PySpark does not run on YARN if master not provided in command line > ------------------------------------------------------------------- > > Key: SPARK-8646 > URL: https://issues.apache.org/jira/browse/SPARK-8646 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN > Affects Versions: 1.4.0 > Environment: SPARK_HOME=local/path/to/spark1.4install/dir > also with > SPARK_HOME=local/path/to/spark1.4install/dir > PYTHONPATH=$SPARK_HOME/python/lib > Spark apps are submitted with the command: > $SPARK_HOME/bin/spark-submit outofstock/data_transform.py > hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client > data_transform contains a main method, and the rest of the args are parsed in > my own code. > Reporter: Juliet Hougland > Assignee: Lianhui Wang > Fix For: 1.5.0 > > Attachments: executor.log, pi-test.log, > spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, > spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, > spark1.4-SPARK_HOME-set.log, spark1.4-verbose.log, verbose-executor.log > > > Running pyspark jobs result in a "no module named pyspark" when run in > yarn-client mode in spark 1.4. > [I believe this JIRA represents the change that introduced this error.| > https://issues.apache.org/jira/browse/SPARK-6869 ] > This does not represent a binary compatible change to spark. Scripts that > worked on previous spark versions (ie comands the use spark-submit) should > continue to work without modification between minor versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org