The pyspark app stdout/err log shows this oddity.
Traceback (most recent call last):
File "/root/spark/notebooks/ingest/XXX.py", line 86, in
print pdfRDD.collect()[:5]
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 773,
in collect
File
Hi all,
Wondering if someone can provide some insight why this pyspark app is
just hanging. Here is output.
...
15/12/03 01:47:05 INFO TaskSetManager: Starting task 21.0 in stage 0.0
(TID 21, 10.65.143.174, PROCESS_LOCAL, 1794787 bytes)
15/12/03 01:47:05 INFO TaskSetManager: Starting task