Correct; and PairRDDFunctions#join does still exist in versions of Spark
that do have DataFrame, so you don't necessarily have to use DataFrame to
do this even then (although there are advantages to using the DataFrame
approach.)

Your basic problem is that you have an RDD of tuples, where each tuple is
of type ((String, String), String, String), while what you need is an
RDD[(K, V)] so that PairRDDFunctions#join can be invoked using you keys and
values.  One way to get there is to use RDD#keyBy.

On Mon, Jun 8, 2015 at 9:58 PM, amit tewari <amittewar...@gmail.com> wrote:

> Thanks, but Spark 1.2 doesnt yet have DataFrame I guess?
>
> Regards
> Amit
>
> On Tue, Jun 9, 2015 at 10:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> join is operation of DataFrame
>>
>> You can call sc.createDataFrame(myRDD) to obtain DataFrame where sc is
>> sqlContext
>>
>> Cheers
>>
>> On Mon, Jun 8, 2015 at 9:44 PM, amit tewari <amittewar...@gmail.com>
>> wrote:
>>
>>> Hi Dear Spark Users
>>>
>>> I am very new to Spark/Scala.
>>>
>>> Am using Datastax (4.7/Spark 1.2.1) and struggling with following
>>> error/issue.
>>>
>>> Already tried options like import org.apache.spark.SparkContext._ or
>>> explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions.
>>> But error not resolved.
>>>
>>> Help much appreciated.
>>>
>>> Thanks
>>> AT
>>>
>>> scala>val input1 = sc.textFile("/test7").map(line =>
>>> line.split(",").map(_.trim));
>>> scala>val input2 = sc.textFile("/test8").map(line =>
>>> line.split(",").map(_.trim));
>>> scala>val input11 = input1.map(x=>((x(0),x(1)),x(2),x(3)))
>>> scala>val input22 = input2.map(x=>((x(0),x(1)),x(2),x(3)))
>>>
>>>  scala> input11.join(input22).take(10)
>>>
>>> <console>:66: error: value join is not a member of
>>> org.apache.spark.rdd.RDD[((String, String), String, String)]
>>>
>>>               input11.join(input22).take(10)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to