Is the first() being computed locally on the driver program? Maybe it's to hard
to compute with the memory, etc available there. Take a look at the driver's
log and see whether it has the message "Computing the requested partition
locally".
Matei
On Jul 22, 2014, at 12:04 PM, Nathan Kronenfel
I was wondering if anyone could provide an explanation for the behavior I'm
seeing.
I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG
with 2 shuffles, not empty, not even terribly big - small enough that some
partitions could be empty.
When I run foo.first, I get workers