Re: RDD.combineBy without intermediate (k,v) pair allocation

2015-01-29 Thread Mohit Jaggi
Francois, RDD.aggregate() does not support aggregation by key. But, indeed, that is the kind of implementation I am looking for, one that does not allocate intermediate space for storing (K,V) pairs. When working with large datasets this type of intermediate memory allocation wrecks havoc with

Re: RDD.combineBy without intermediate (k,v) pair allocation

2015-01-29 Thread francois . garillot
Oh, I’m sorry, I meant `aggregateByKey`. https://spark.apache.org/docs/1.2.0/api/scala/#org.apache.spark.rdd.PairRDDFunctions — FG On Thu, Jan 29, 2015 at 7:58 PM, Mohit Jaggi mohitja...@gmail.com wrote: Francois, RDD.aggregate() does not support aggregation by key. But, indeed, that is

Re: RDD.combineBy without intermediate (k,v) pair allocation

2015-01-29 Thread francois . garillot
Sorry, I answered too fast. Please disregard my last message: I did mean aggregate.  You say: RDD.aggregate() does not support aggregation by key. What would you need aggregation by key for, if you do not, at the beginning, have an RDD of key-value pairs, and do not want to build one ?