Hi Ian, sorry for late reply. I was able to reproduce the same error with spark 1.4.1 & hadoop 2.6.0. Turned out it was bug from Zeppelin. After some search, I realized that `spark.yarn.isPython` property is introduced since 1.5.0. I just made a PR( https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will be really appreciated if you can try it and see if it works. Thank you for reporting bug!
Regard, Mina On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <rachmaninovquar...@gmail.com> wrote: > Hi Mina, > > Thanks for the response. I recloned the master from github and built using: > mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark > > I did that locally then scped to a node in a cluster running HDP 2.3 > (spark 1.4.1 & hadoop 2.7.1). > > I added the two config files from below and started the Zeppelin daemon. > Inspecting the spark.yarn.isPython config in the spark UI, showed it to be > "true". > > The pyspark interpreter gives the same error as before. Are there any > other configs I should check? I'm beginning to wonder if it's related to > something in Hortonworks' distribution of spark or yarn. > > > > On Tuesday, February 16, 2016, mina lee <mina...@apache.org> wrote: > >> Hi Ian, >> >> The log stack looks quite similar with >> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since >> v0.5.6 >> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn >> worker nodes. >> >> If you are building from source code can you please double check that you >> pulled the latest master? >> And also to be sure can you confirm that if you can see >> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI) >> > Environment > Spark Properties? >> >> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney < >> rachmaninovquar...@gmail.com> wrote: >> >>> Hi, >>> >>> I've been trying unsuccessfully to configure the pyspark interpreter on >>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter >>> from Zeppelin without issue. Here are the lines which aren't commented out >>> in my zeppelin-env.sh file: >>> >>> export MASTER=yarn-client >>> >>> export ZEPPELIN_PORT=8090 >>> >>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 >>> -Dspark.yarn.queue=default" >>> >>> export SPARK_HOME=/usr/hdp/current/spark-client/ >>> >>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>> >>> export PYSPARK_PYTHON=/usr/bin/python >>> >>> export >>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH >>> >>> Running a simple pyspark script in the interpreter gives this error: >>> >>> 1. Py4JJavaError: An error occurred while calling >>> z:org.apache.spark.api.python.PythonRDD.runJob. >>> 2. : org.apache.spark.SparkException: Job aborted due to stage >>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task >>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): >>> org.apache.spark.SparkException: >>> 3. Error from python worker: >>> 4. /usr/bin/python: No module named pyspark >>> 5. PYTHONPATH was: >>> 6. >>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar >>> >>> More details can be found here: >>> >>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html >>> >>> Thanks, >>> >>> Ian >>> >>> >>