The pyspark app stdout/err log shows this oddity.
Traceback (most recent call last):
File "/root/spark/notebooks/ingest/XXX.py", line 86, in
print pdfRDD.collect()[:5]
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 773,
in collect
File
"/root/spark/python/lib/py4j-0.8
Is this the stderr output from a woker? Are any files being written? Can
you run in debug and see how far it's getting?
This to me doesn't give me a direction to look without the actual logs
from $SPARK_HOME or the stderr from the worker UI.
Just imho maybe someone know what this means but it