Hello,

I noticed that the final reduce function happens in the driver node with a
code that looks like the following.

val outputMap = mapPartition(domsomething).reduce(a: Map, b: Map) {
 a.merge(b)
}

although individual outputs from mappers are small. Over time the
aggregated result outputMap could be huuuge (say with hundreds of millions
of keys and values, reaching giga bytes).

I noticed that, even if we have a lot of memory in the driver node, this
process becomes reallllly slow eventually (say we have 100+ partitions. the
first reduce is fast, but progressively, it becomes veeery slow as more and
more partition outputs get aggregated). Is this because the intermediate
reduce output gets serialized and then deserialized every time?

What I'd like ideally is, since reduce is taking place in the same machine
any way, there's no need for any serialization and deserialization, and
just aggregate the incoming results into the final aggregation. Is this
possible?

Reply via email to