Where is reduceByKey?

Philip Ogren Thu, 07 Nov 2013 12:57:29 -0800

On the front page <http://spark.incubator.apache.org/> of the Sparkwebsite there is the following simple word count implementation:


file = spark.textFile("hdfs://...")

file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _)

The same code can be found in the Quick Start<http://spark.incubator.apache.org/docs/latest/quick-start.html> quide.When I follow the steps in my spark-shell (version 0.8.0) it worksfine. The reduceByKey method is also shown in the list oftransformations<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>in the Spark Programming Guide. The bottom of this list directs thereader to the API docs for the class RDD (this link is broken, BTW).The API docs for RDD<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>does not list a reduceByKey method for RDD. Also, when I try to compilethe above code in a Scala class definition I get the following compileerror:

value reduceByKey is not a member oforg.apache.spark.rdd.RDD[(java.lang.String, Int)]


I am compiling with maven using the following dependency definition:

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.9.3</artifactId>
            <version>0.8.0-incubating</version>
        </dependency>

Can someone help me understand why this code works fine from thespark-shell but doesn't seem to exist in the API docs and won't compile?


Thanks,
Philip

Where is reduceByKey?

Reply via email to