Your problem is more basic than that. You can't reference one RDD (subsetids) from within an operation on another RDD (allobjects.filter).
On Wed, Feb 19, 2014 at 2:23 PM, Soumya Simanta <[email protected]>wrote: > I've a RDD that contains ids (Long). > > subsetids > > res22: org.apache.spark.rdd.RDD[Long] > > > I've another RDD that has an Object (MyObject) where one of the field is > an id (Long). > > allobjects > > res23: org.apache.spark.rdd.RDD[MyObject] = MappedRDD[272] > > Now I want to run filter on allobjects so that I can get a subset that > matches with the ids that are present in my first RDD (i.e., subsetids) > > Say something like - > > val subsetObjs = allobjects.filter( x => subsetids.contains(x.getId) ) > > However, there is no method "contains" so I'm looking for the most > efficient way to achieving this in Spark. > > Thanks. > > > >
