zeppelin's py4j conflicts with spark's

Rui Lu Thu, 25 Mar 2021 07:49:11 -0700

Hi all,

I’m trying to switch from pyspark interpreter to python interpreter and ran
into weird errors of py4j like “key error ‘x’” or “invalid command” or so
when creating spark session.


A little digging reveals that zeppelin has its own py4j stuffed into
PYTHONPATH of python interpreter, the value of PYTHONPATH I get from within
python interpreter notebook is

'/usr/lib/zeppelin/interpreter/python/py4j-0.9.2/src:/usr/lib/spark/python:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip'

The latter parts with /usr/lib/spark are added into PYTHONPATH by me in
zeppelin-env.sh. However no matter how I arrange the order of paths in
PYTHONPATH, somehow zeppelin managed to set its py4j in the first place,
preventing python interpreter from finding the right py4j that comes with
spark. I suppose zeppelin manipulated PYTHONPATH after loading
zeppelin-env.sh.

The reason I prefer python interpreter is that I would like to have fine
control over spark params per notebook (whereas pyspark uses the same set
of spark params for all notebooks)

Has anyone run into the same issue or is there a workaround for this?

Thanks!
--
Lu Rui

-- 
*This email may contain or reference confidential information and is 
intended only for the individual to whom it is addressed.  Please refrain 
from distributing, disclosing or copying this email and the information 
contained within unless you are the intended recipient.  If you received 
this email in error, please notify us at le...@appannie.com 
<mailto:le...@appannie.com>** immediately and remove it from your system.*

zeppelin's py4j conflicts with spark's

Reply via email to