Your problem is more basic than that.  You can't reference one RDD
(subsetids) from within an operation on another RDD (allobjects.filter).


On Wed, Feb 19, 2014 at 2:23 PM, Soumya Simanta <[email protected]>wrote:

> I've a RDD that contains ids (Long).
>
> subsetids
>
> res22: org.apache.spark.rdd.RDD[Long]
>
>
> I've another RDD that has an Object (MyObject) where one of the field is
> an id (Long).
>
> allobjects
>
> res23: org.apache.spark.rdd.RDD[MyObject] = MappedRDD[272]
>
> Now I want to run filter on allobjects so that I can get a subset that
> matches with the ids that are present in my first RDD (i.e., subsetids)
>
> Say something like -
>
> val subsetObjs = allobjects.filter( x => subsetids.contains(x.getId) )
>
> However, there is no method "contains" so I'm looking for the most
> efficient way to achieving this in Spark.
>
> Thanks.
>
>
>
>

Reply via email to