Correct; and PairRDDFunctions#join does still exist in versions of Spark that do have DataFrame, so you don't necessarily have to use DataFrame to do this even then (although there are advantages to using the DataFrame approach.)
Your basic problem is that you have an RDD of tuples, where each tuple is of type ((String, String), String, String), while what you need is an RDD[(K, V)] so that PairRDDFunctions#join can be invoked using you keys and values. One way to get there is to use RDD#keyBy. On Mon, Jun 8, 2015 at 9:58 PM, amit tewari <amittewar...@gmail.com> wrote: > Thanks, but Spark 1.2 doesnt yet have DataFrame I guess? > > Regards > Amit > > On Tue, Jun 9, 2015 at 10:25 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> join is operation of DataFrame >> >> You can call sc.createDataFrame(myRDD) to obtain DataFrame where sc is >> sqlContext >> >> Cheers >> >> On Mon, Jun 8, 2015 at 9:44 PM, amit tewari <amittewar...@gmail.com> >> wrote: >> >>> Hi Dear Spark Users >>> >>> I am very new to Spark/Scala. >>> >>> Am using Datastax (4.7/Spark 1.2.1) and struggling with following >>> error/issue. >>> >>> Already tried options like import org.apache.spark.SparkContext._ or >>> explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions. >>> But error not resolved. >>> >>> Help much appreciated. >>> >>> Thanks >>> AT >>> >>> scala>val input1 = sc.textFile("/test7").map(line => >>> line.split(",").map(_.trim)); >>> scala>val input2 = sc.textFile("/test8").map(line => >>> line.split(",").map(_.trim)); >>> scala>val input11 = input1.map(x=>((x(0),x(1)),x(2),x(3))) >>> scala>val input22 = input2.map(x=>((x(0),x(1)),x(2),x(3))) >>> >>> scala> input11.join(input22).take(10) >>> >>> <console>:66: error: value join is not a member of >>> org.apache.spark.rdd.RDD[((String, String), String, String)] >>> >>> input11.join(input22).take(10) >>> >>> >>> >>> >>> >>> >>> >> >