Hi All, I have a use case where I have an RDD (not a k,v pair) where I want to do a combineByKey() operation. I can do that by creating an intermediate RDD of k,v pairs and using PairRDDFunctions.combineByKey(). However, I believe it will be more efficient if I can avoid this intermediate RDD. Is there a way I can do this by passing in a function that extracts the key, like in RDD.groupBy()? [oops, RDD.groupBy seems to create the intermediate RDD anyway, maybe a better implementation is possible for that too?] If not, is it worth adding to the Spark API?
Mohit. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
