Not sure if this is helpful or not, but in one executor "stderr" log, I found
this:
14/08/19 20:17:04 INFO CacheManager: Partition rdd_5_14 not found, computing
it
14/08/19 20:17:04 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/08/1
update: hangs even when not writing to hdfs. I changed the code to avoid
saveAsTextFile() and instead do a forEachParitition and log the results.
This time it hangs at 96/100 tasks, but still hangs.
I changed the saveAsTextFile to:
stringIntegerJavaPairRDD.foreachPartition(p -> {