I have a large Map that is assembled in the driver and broadcast to each node.
My question is how best to allocate memory for this. The Driver has to have
enough memory for the Maps, but only one copy is serialized to each node. What
type of memory should I size to match the Maps? Is the broadc
If you are creating a huge map on the driver, then spark.driver.memory
should be set to a higher value to hold your map. Since you are going to
broadcast this map, your spark executors must have enough memory to hold
this map as well which can be set using the spark.executor.memory, and
spark.stora
Note that starting with Spark 1.6, memory can be dynamically allocated by
the Spark execution engine based on workload heuristics.
You can still set a low watermark for the spark.storage.memoryFraction (RDD
cache), but the rest can be dynamic.
Here's some relevant slides from a recent presentatio