Hi all, So over the past few days I've been attempting to create a function that takes an RDD[U], and creates three MMaps. I've been attempting to aggregate these values but I'm running into a major issue.
when I initially tried to use separate aggregators for each map, I noticed a significant slowdown due to the fact that I was running three aggregates. To battle this issue I created one aggregator that takes in all three values as a tuple and acts someone val zeroValue: (A, B, C) = ??? // (accum1.zero, accum2.zero, accum3.zero) def seqOp(r: (A, B, C), t: T): (A, B, C) = r match { // (accum1.addAccumulator(a, t), ..., accum3..addAccumulator(c, t)) case (a, b, c) => ??? } def combOp(r1: (A, B, C), r2: (A, B, C)): (A, B, C) = (r1, r2) match { // (acc1.addInPlace(a1, a2), ..., acc3.addInPlace(c1, c2)) case ((a1, b1, c1), (a2, b2, c2)) => ???} val rdd: RDD[T] = ??? val accums: (A, B, C) = rdd.aggregate(zeroValue)(seqOp, combOp) However, upon building this joint aggregator I've noticed an obscene amount of garbage collection which is grinding my progress to a halt. My current theory is that because I'm using a tuple of maps rather than individual mutable maps that the system is creating way too many objects. Has anyone run into a problem like this before? Does anyone have any suggestions for aggregating multiple values without creating a new object eve