On the front page <http://spark.incubator.apache.org/> of the Spark
website there is the following simple word count implementation:
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_ + _)
The same code can be found in the Quick Start
<http://spark.incubator.apache.org/docs/latest/quick-start.html> quide.
When I follow the steps in my spark-shell (version 0.8.0) it works
fine. The reduceByKey method is also shown in the list of
transformations
<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>
in the Spark Programming Guide. The bottom of this list directs the
reader to the API docs for the class RDD (this link is broken, BTW).
The API docs for RDD
<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>
does not list a reduceByKey method for RDD. Also, when I try to compile
the above code in a Scala class definition I get the following compile
error:
value reduceByKey is not a member of
org.apache.spark.rdd.RDD[(java.lang.String, Int)]
I am compiling with maven using the following dependency definition:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.9.3</artifactId>
<version>0.8.0-incubating</version>
</dependency>
Can someone help me understand why this code works fine from the
spark-shell but doesn't seem to exist in the API docs and won't compile?
Thanks,
Philip