Can anyone help me understand why using coalesce causes my executors to crash with out of memory? What happens during coalesce that increases memory usage so much?

If I do:
hadoopFile -> sample -> cache -> map -> saveAsNewAPIHadoopFile

everything works fine, but if I do:
hadoopFile -> sample -> coalesce -> cache -> map -> saveAsNewAPIHadoopFile

my executors crash with out of memory exceptions.

Is there any documentation that explains what causes the increased memory requirements with coalesce? It seems to be less of a problem if I coalesce into a larger number of partitions, but I'm not sure why this is. How would I estimate how much additional memory the coalesce requires?

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to