Hi All,

After some hair pulling, I've reached the realisation that an operation I
am currently doing via:

myRDD.groupByKey.mapValues(func)

should be done more efficiently using aggregateByKey or combineByKey. Both
of these methods would do, and they seem very similar to me in terms of
their function.

My question is, what are the differences between these two methods (other
than the slight differences in their type signatures)? Under what
circumstances should I use one or the other?

Thanks

Dave

Reply via email to