$ ls /usr/lib/spark/python/lib py4j-0.10.6-src.zip PY4J_LICENSE.txt pyspark.zip
So folder exists and contains both necessary zips. Please note, that in local or yarn-client mode the files are properly picked up from that very same location. How does yarn-cluster work under the hood? Could it be that environment variables (like SPARK_HOME) are lost, because they are only available in my local shell + zeppelin daemon process? Do I need to tell YARN somehow about SPARK_HOME? Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang <zjf...@gmail.com>: > > Could you check whether there's folder /usr/lib/spark/python/lib ? > > > Thomas Bünger <thom.bu...@googlemail.com>于2018年6月5日周二 下午8:45写道: > >> >> sys.env >> java.lang.NullPointerException at >> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149) >> at >> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90) >> at >> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62) >> at >> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) >> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at >> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang <zjf...@gmail.com>: >> >>> Could you paste the full stracktrace ? >>> >>> >>> Thomas Bünger <thom.bu...@googlemail.com>于2018年6月5日周二 下午8:21写道: >>> >>>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled >>>> version of spark under /usr/lib/spark. >>>> >>>> This works fine in local or yarn-client mode, but in yarn-cluster mode >>>> i just get a >>>> >>>> java.lang.NullPointerException at >>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149) >>>> >>>> Seems to be caused by an unsuccessful search for the py4j libraries. >>>> I've made sure that SPARK_HOME is actually set in .bash_rc, in >>>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote >>>> interpreter, something odd is going on. >>>> >>>> Best regards, >>>> Thomas >>>> >>>