On the front page <http://spark.incubator.apache.org/> of the Spark
website there is the following simple word count implementation:
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_ + _)
The same code can be found in the Quick Start
<http://spark.incubator.apache.org/docs/latest/quick-start.html> quide.
When I follow the steps in my spark-shell (version 0.8.0) it works
fine. The reduceByKey method is also shown in the list of
in the Spark Programming Guide. The bottom of this list directs the
reader to the API docs for the class RDD (this link is broken, BTW).
The API docs for RDD
does not list a reduceByKey method for RDD. Also, when I try to compile
the above code in a Scala class definition I get the following compile
value reduceByKey is not a member of
org.apache.spark.rdd.RDD[(java.lang.String, Int)]
I am compiling with maven using the following dependency definition:
Can someone help me understand why this code works fine from the
spark-shell but doesn't seem to exist in the API docs and won't compile?