Hi,

we've upgraded from 0.8 to 0.9 and I observe that with the same interpreter
settings,
PySpark no longer works with:

    java.io.IOException: Fail to launch python process.

    Traceback (most recent call last):

      File "/tmp/1615477929423-0/zeppelin_python.py", line 20, in <module>

        from py4j.java_gateway import java_import, JavaGateway,
GatewayClient

    ModuleNotFoundError: No module named 'py4j'

Comparing logs, I see that for 0.8:

 INFO [2021-03-11 15:40:13,565] ({pool-3-thread-5}
PySparkInterpreter.java[createGatewayServerAndStartScript]:265) â
<U+0080><U+0094> pythonExec:
/mnt/conda/envs/zeppelin-pyspark-python3/bin/python

 INFO [2021-03-11 15:40:13,585] ({pool-3-thread-5}
PySparkInterpreter.java[setupPySparkEnv]:236) â<U+0080><U+0094> PYTHONPATH:
/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip:/mnt/zeppelin-0.8.3-SNAPSHOT/../interpreter/lib/python

Whereas 0.9 logs say:

 INFO [2021-03-11 15:52:09,428]
({FIFOScheduler-interpreter_293940413-Worker-1}
PythonInterpreter.java[setupPythonEnv]:212) - PYTHONPATH:
/tmp/1615477929423-0

 INFO [2021-03-11 15:52:09,428]
({FIFOScheduler-interpreter_293940413-Worker-1}
PythonInterpreter.java[createGatewayServerAndStartScript]:147) - Launching
Python Process Command: /mnt/conda/envs/zeppelin-pyspark-python3/bin/python
/tmp/1615477929423-0/zeppelin_python.py 10.4.2.199 37753


In other words, looks like 0.9 does not add pyspark zips to PYTHONPATH.
Looking at the history, I see a major refactoring in this area:


https://github.com/apache/zeppelin/commit/0a97446a70f6294a3efb071bb9a70601f885840b

But can't quite understand whether this change in behavour is intentional,
and what additional options I might need to set. Does anybody have any
suggestions?

-- 
Vladimir Prus
http://vladimirprus.com

Reply via email to