RDD.combineBy

Mohit Jaggi Tue, 27 Jan 2015 13:18:33 -0800

Hi All,
I have a use case where I have an RDD (not a k,v pair) where I want to do a 
combineByKey() operation. I can do that by creating an intermediate RDD of k,v 
pairs and using PairRDDFunctions.combineByKey(). However, I believe it will be 
more efficient if I can avoid this intermediate RDD. Is there a way I can do 
this by passing in a function that extracts the key, like in RDD.groupBy()? 
[oops, RDD.groupBy seems to create the intermediate RDD anyway, maybe a better 
implementation is possible for that too?]
If not, is it worth adding to the Spark API?


Mohit.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RDD.combineBy

Reply via email to