On the front page <http://spark.incubator.apache.org/> of the Spark website there is the following simple word count implementation:

file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

The same code can be found in the Quick Start <http://spark.incubator.apache.org/docs/latest/quick-start.html> quide. When I follow the steps in my spark-shell (version 0.8.0) it works fine. The reduceByKey method is also shown in the list of transformations <http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations> in the Spark Programming Guide. The bottom of this list directs the reader to the API docs for the class RDD (this link is broken, BTW). The API docs for RDD <http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD> does not list a reduceByKey method for RDD. Also, when I try to compile the above code in a Scala class definition I get the following compile error:

value reduceByKey is not a member of org.apache.spark.rdd.RDD[(java.lang.String, Int)]

I am compiling with maven using the following dependency definition:

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.9.3</artifactId>
            <version>0.8.0-incubating</version>
        </dependency>

Can someone help me understand why this code works fine from the spark-shell but doesn't seem to exist in the API docs and won't compile?

Thanks,
Philip

Reply via email to