Re: NewSparkInterpreter fails on yarn-cluster

Thomas Bünger Tue, 05 Jun 2018 05:56:59 -0700

$ ls /usr/lib/spark/python/lib
py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip


So folder exists and contains both necessary zips. Please note, that in
local or yarn-client mode the files are properly picked up from that very
same location.

How does yarn-cluster work under the hood? Could it be that environment
variables (like SPARK_HOME) are lost, because they are only available in my
local shell + zeppelin daemon process? Do I need to tell YARN somehow about
SPARK_HOME?

Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang <[email protected]>:

>
> Could you check whether there's folder /usr/lib/spark/python/lib ?
>
>
> Thomas Bünger <[email protected]>于2018年6月5日周二 下午8:45写道：
>
>>
>> sys.env
>> java.lang.NullPointerException at
>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>> at
>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang <[email protected]>:
>>
>>> Could you paste the full stracktrace ?
>>>
>>>
>>> Thomas Bünger <[email protected]>于2018年6月5日周二 下午8:21写道：
>>>
>>>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
>>>> version of spark under /usr/lib/spark.
>>>>
>>>> This works fine in local or yarn-client mode, but in yarn-cluster mode
>>>> i just get a
>>>>
>>>> java.lang.NullPointerException at
>>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>>>
>>>> Seems to be caused by an unsuccessful search for the py4j libraries.
>>>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>>>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>>>> interpreter, something odd is going on.
>>>>
>>>> Best regards,
>>>>  Thomas
>>>>
>>>

Re: NewSparkInterpreter fails on yarn-cluster

Reply via email to