Hi,
we've upgraded from 0.8 to 0.9 and I observe that with the same interpreter
settings,
PySpark no longer works with:
java.io.IOException: Fail to launch python process.
Traceback (most recent call last):
File "/tmp/1615477929423-0/zeppelin_python.py", line 20, in <module>
from py4j.java_gateway import java_import, JavaGateway,
GatewayClient
ModuleNotFoundError: No module named 'py4j'
Comparing logs, I see that for 0.8:
INFO [2021-03-11 15:40:13,565] ({pool-3-thread-5}
PySparkInterpreter.java[createGatewayServerAndStartScript]:265) â
<U+0080><U+0094> pythonExec:
/mnt/conda/envs/zeppelin-pyspark-python3/bin/python
INFO [2021-03-11 15:40:13,585] ({pool-3-thread-5}
PySparkInterpreter.java[setupPySparkEnv]:236) â<U+0080><U+0094> PYTHONPATH:
/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip:/mnt/zeppelin-0.8.3-SNAPSHOT/../interpreter/lib/python
Whereas 0.9 logs say:
INFO [2021-03-11 15:52:09,428]
({FIFOScheduler-interpreter_293940413-Worker-1}
PythonInterpreter.java[setupPythonEnv]:212) - PYTHONPATH:
/tmp/1615477929423-0
INFO [2021-03-11 15:52:09,428]
({FIFOScheduler-interpreter_293940413-Worker-1}
PythonInterpreter.java[createGatewayServerAndStartScript]:147) - Launching
Python Process Command: /mnt/conda/envs/zeppelin-pyspark-python3/bin/python
/tmp/1615477929423-0/zeppelin_python.py 10.4.2.199 37753
In other words, looks like 0.9 does not add pyspark zips to PYTHONPATH.
Looking at the history, I see a major refactoring in this area:
https://github.com/apache/zeppelin/commit/0a97446a70f6294a3efb071bb9a70601f885840b
But can't quite understand whether this change in behavour is intentional,
and what additional options I might need to set. Does anybody have any
suggestions?
--
Vladimir Prus
http://vladimirprus.com