Can anyone help me understand why using coalesce causes my executors to
crash with out of memory? What happens during coalesce that increases
memory usage so much?
If I do:
hadoopFile -> sample -> cache -> map -> saveAsNewAPIHadoopFile
everything works fine, but if I do:
hadoopFile -> sample -> coalesce -> cache -> map -> saveAsNewAPIHadoopFile
my executors crash with out of memory exceptions.
Is there any documentation that explains what causes the increased
memory requirements with coalesce? It seems to be less of a problem if I
coalesce into a larger number of partitions, but I'm not sure why this
is. How would I estimate how much additional memory the coalesce requires?
Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org