Have you tried joins on regular RDD instead of schemaRDD? We have found
that its 10 times faster than joins between schemaRDDs.
val largeRDD = ...
val smallRDD = ...
largeRDD.join(smallRDD) // otherway JOIN would run for long.
Only limitation i see with that implementation is regular RDD suppor
Thanks a lot, that fixed the issue :)
On Thu, Sep 4, 2014 at 4:51 PM, Zhan Zhang wrote:
> Try this:
> Import org.apache.spark.SparkContext._
>
> Thanks.
>
> Zhan Zhang
>
>
> On Sep 4, 2014, at 4:36 PM, Veeranagouda Mukkanagoudar
> wrote:
>
> I am plannin
I am planning to use RDD join operation, to test out i was trying to
compile some test code, but am getting following compilation error
*value join is not a member of org.apache.spark.rdd.RDD[(String, Int)]*
*[error] rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }*
Code:
import org.apac
9h
>
> GHn2TTXJ31eGH+Iin0TG/SBLs8OKCttD0OeS+1XFH5zAHSSFlc734BDb5LQnBkqGDpIE
> hU8g==
> MIME-Version: 1.0
> X-Received: by 10.194.87.97 with SMTP id w1mr2272592wjz.42.1405116657184;
> Fri,
> 11 Jul 2014 15:10:57 -0700 (PDT)
> Received: by 10.194.204.228 with HTTP; Fri, 11 Jul 2014 15:10:57 -0700
> (PDT)
> Date: Fri, 11 Jul 2014 15:10:57 -0700
> Message-ID: <
> cafep6g8gs80d_vmbxd9ghrvbolbrab1pbu9k7updlm2v8n8...@mail.gmail.com>
> Subject: please grant me subscriber access
> From: Veeranagouda Mukkanagoudar
> To: user-subscr...@spark.apache.org
> Content-Type: multipart/alternative; boundary=089e010d86283835d004fdf23707
> X-Virus-Checked: Checked by ClamAV on apache.org
>
>