Hi,
I am using HDP3.0 (zeppelin 0.8.0) and my notebook using livy2.pyspark
interpreter crashes (RPC channel is stopped) the livy session frequently.
The yarn log tells:
18/08/22 22:39:47 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
18/08/22 22:39:47 INFO SparkContext: Invoking stop() from shutdown hook
18/08/22 22:39:47 INFO AbstractConnector: Stopped Spark@50e3245
{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
18/08/22 22:39:47 INFO SparkUI: Stopped Spark web UI at
http://prod1-datanode5.com:41809
18/08/22 22:39:47 ERROR PythonInterpreter: Process has died with 143
18/08/22 22:39:47 ERROR PythonInterpreter:
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip/pyspark/context.py:237:
RuntimeWarning: Failed to add file
[file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip] speficied in
'spark.submit.pyFiles' to Python path:
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/tmp
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/spark-480aaf8e-809f-4fc2-a0b5-6f64e6c36984/userFiles-14e4846c-2a84-4eca-9879-00c6752ac7ab
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/py4j-0.10.7-src.zip
/mnt/data/usr/lib/anaconda3/lib/python36.zip
/mnt/data/usr/lib/anaconda3/lib/python3.6
/mnt/data/usr/lib/anaconda3/lib/python3.6/lib-dynload
/mnt/data/usr/lib/anaconda3/lib/python3.6/site-packages
RuntimeWarning)
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip/pyspark/context.py:237:
RuntimeWarning: Failed to add file
[file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip]
speficied in 'spark.submit.pyFiles' to Python path:
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/tmp
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/spark-480aaf8e-809f-4fc2-a0b5-6f64e6c36984/userFiles-14e4846c-2a84-4eca-9879-00c6752ac7ab
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip
/mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/py4j-0.10.7-src.zip
/mnt/data/usr/lib/anaconda3/lib/python36.zip
/mnt/data/usr/lib/anaconda3/lib/python3.6
/mnt/data/usr/lib/anaconda3/lib/python3.6/lib-dynload
/mnt/data/usr/lib/anaconda3/lib/python3.6/site-packages
RuntimeWarning)
For example, after I restart livy interpreter, running all paragraphs the
first time succeeds and running the second time makes the application
throws this error in yarn log. When this happens, I need to restart livy
interpreter and rerun the whole notebook. It is very annoying.
I already checked that /usr/hdp/current/spark2-client/python/lib/pyspark.zip
and /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip exist.
Any idea why this happens? Appreciate any help!