Hello, I noticed that the final reduce function happens in the driver node with a code that looks like the following.
val outputMap = mapPartition(domsomething).reduce(a: Map, b: Map) { a.merge(b) } although individual outputs from mappers are small. Over time the aggregated result outputMap could be huuuge (say with hundreds of millions of keys and values, reaching giga bytes). I noticed that, even if we have a lot of memory in the driver node, this process becomes reallllly slow eventually (say we have 100+ partitions. the first reduce is fast, but progressively, it becomes veeery slow as more and more partition outputs get aggregated). Is this because the intermediate reduce output gets serialized and then deserialized every time? What I'd like ideally is, since reduce is taking place in the same machine any way, there's no need for any serialization and deserialization, and just aggregate the incoming results into the final aggregation. Is this possible?