Koert's answer is very likely correct. This implicit definition which converts an RDD[(K, V)] to provide PairRDDFunctions requires a ClassTag is available for K: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1124
To fully understand what's going on from a Scala beginner's point of view, you'll have to look up ClassTags, context bounds (the "K : ClassTag" syntax), and implicit functions. Fortunately, you don't have to understand monads... On Tue, Apr 1, 2014 at 2:06 PM, Koert Kuipers <ko...@tresata.com> wrote: > import org.apache.spark.SparkContext._ > import org.apache.spark.rdd.RDD > import scala.reflect.ClassTag > > def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : > RDD[(K, Int)] = { > > rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } > } > > > On Tue, Apr 1, 2014 at 4:55 PM, Daniel Siegmann > <daniel.siegm...@velos.io>wrote: > >> When my tuple type includes a generic type parameter, the pair RDD >> functions aren't available. Take for example the following (a join on two >> RDDs, taking the sum of the values): >> >> def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) : >> RDD[(String, Int)] = { >> rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } >> } >> >> That works fine, but lets say I replace the type of the key with a >> generic type: >> >> def joinTest[K](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K, Int)] >> = { >> rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } >> } >> >> This latter function gets the compiler error "value join is not a member >> of org.apache.spark.rdd.RDD[(K, Int)]". >> >> The reason is probably obvious, but I don't have much Scala experience. >> Can anyone explain what I'm doing wrong? >> >> -- >> Daniel Siegmann, Software Developer >> Velos >> Accelerating Machine Learning >> >> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 >> E: daniel.siegm...@velos.io W: www.velos.io >> > >