PipedRDD is an RDD[String]. If you know how to parse each result line into (key, value) pairs, then you can call reduce after.
piped.map(x => (key, value)).reduceByKey((v1, v2) => v) -Xiangrui On Wed, Apr 23, 2014 at 2:09 AM, zhxfl <[email protected]> wrote: > Hello,we know Hadoop-streaming is use for Hadoop to run native program. > Hadoop-streaming supports Map and Reduce logic. Reduce logic means Hadoop > collect all values with same key and give the stream for the native > application. > Spark has PipeRDD too, but PipeRDD doesn't support Reduce logic. So it's > difficulty for us to transplant our application from Hadoop to Spark. > Anyone can give me advise, thanks!
