Francois,
RDD.aggregate() does not support aggregation by key. But, indeed, that is the
kind of implementation I am looking for, one that does not allocate
intermediate space for storing (K,V) pairs. When working with large datasets
this type of intermediate memory allocation wrecks havoc with
Oh, I’m sorry, I meant `aggregateByKey`.
https://spark.apache.org/docs/1.2.0/api/scala/#org.apache.spark.rdd.PairRDDFunctions
—
FG
On Thu, Jan 29, 2015 at 7:58 PM, Mohit Jaggi mohitja...@gmail.com wrote:
Francois,
RDD.aggregate() does not support aggregation by key. But, indeed, that is
Sorry, I answered too fast. Please disregard my last message: I did mean
aggregate.
You say: RDD.aggregate() does not support aggregation by key.
What would you need aggregation by key for, if you do not, at the beginning,
have an RDD of key-value pairs, and do not want to build one ?