Hi Ian, sorry for late reply.
I was able to reproduce the same error with spark 1.4.1 & hadoop
2.6.0. Turned out it was bug from Zeppelin.
After some search, I realized that `spark.yarn.isPython` property is
introduced since 1.5.0. I just made a PR(
https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will
be really appreciated if you can try it and see if it works. Thank you for
reporting bug!

Regard,
Mina

On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <rachmaninovquar...@gmail.com>
wrote:

> Hi Mina,
>
> Thanks for the response. I recloned the master from github and built using:
> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
>
> I did that locally then scped to a node in a cluster running HDP 2.3
> (spark 1.4.1 & hadoop 2.7.1).
>
> I added the two config files from below and started the Zeppelin daemon.
> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
> "true".
>
> The pyspark interpreter gives the same error as before. Are there any
> other configs I should check? I'm beginning to wonder if it's related to
> something in Hortonworks' distribution of spark or yarn.
>
>
>
> On Tuesday, February 16, 2016, mina lee <mina...@apache.org> wrote:
>
>> Hi Ian,
>>
>> The log stack looks quite similar with
>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since
>> v0.5.6
>> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
>> worker nodes.
>>
>> If you are building from source code can you please double check that you
>> pulled the latest master?
>> And also to be sure can you confirm that if you can see
>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
>> > Environment > Spark Properties?
>>
>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <
>> rachmaninovquar...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've been trying unsuccessfully to configure the pyspark interpreter on
>>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
>>> from Zeppelin without issue. Here are the lines which aren't commented out
>>> in my zeppelin-env.sh file:
>>>
>>> export MASTER=yarn-client
>>>
>>> export ZEPPELIN_PORT=8090
>>>
>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>>> -Dspark.yarn.queue=default"
>>>
>>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>>
>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>
>>> export PYSPARK_PYTHON=/usr/bin/python
>>>
>>> export
>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>>
>>> Running a simple pyspark script in the interpreter gives this error:
>>>
>>>   1.  Py4JJavaError: An error occurred while calling
>>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>>> org.apache.spark.SparkException:
>>>   3.  Error from python worker:
>>>   4.    /usr/bin/python: No module named pyspark
>>>   5.  PYTHONPATH was:
>>>   6.
>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>>
>>> More details can be found here:
>>>
>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>>
>>> Thanks,
>>>
>>> Ian
>>>
>>>
>>

Reply via email to