Hi,

I have an RDD of (Key, Value) pairs that I would like to save to HDFS.
However, rather than putting everything into one file, I would like to
split the RDD by key and save each part as a separate file. The key would
become the filename.

In short, I am trying to do something like this:
myRDD.groupByKey().foreach{ case(key, values) => values.saveAsTextFile(key)
}

This obviously doesn't work since values is of type Seq[V] instead of
RDD[V], but does anyone have any suggestions for doing this efficiently?
Currently, I am repeatedly filtering and saving the RDD, but this seems
inefficient.

Thanks,
Nick

Reply via email to