Re: Hadoop—streaming

Xiangrui Meng Wed, 23 Apr 2014 10:08:52 -0700

PipedRDD is an RDD[String]. If you know how to parse each result line
into (key, value) pairs, then you can call reduce after.


piped.map(x => (key, value)).reduceByKey((v1, v2) => v)

-Xiangrui

On Wed, Apr 23, 2014 at 2:09 AM, zhxfl <[email protected]> wrote:
> Hello，we know Hadoop-streaming is use for Hadoop to run native program.
> Hadoop-streaming supports  Map and Reduce logic. Reduce logic means Hadoop
> collect all values with same key and give the stream for the native
> application.
> Spark has PipeRDD too, but PipeRDD doesn't support Reduce logic. So it's
> difficulty for us to transplant our application from Hadoop to Spark.
> Anyone can give me advise, thanks!

Re: Hadoop—streaming

Reply via email to