I was wondering if anyone could provide an explanation for the behavior I'm
seeing.
I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG
with 2 shuffles, not empty, not even terribly big - small enough that some
partitions could be empty.
When I run foo.first, I get workers disconnecting, and applications die
When I run foo.mapPartitions.saveAsHadoopDataset, it works fine.
Anyone got an explanation for why that might be?
-Thanks, Nathan
--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone: +1-416-203-3003 x 238
Email: [email protected]