Right, if your RDD has a Partitioner, then lookup() will use that to determine which partition contains the key that you want to lookup and only run a task on that partition.
That still doesn't efficiently solve the lookup-a-set-of-keys problem, but extending lookup() to efficiently handle a Set[K] is pretty straightforward, if someone wants to tackle it. On Fri, Dec 13, 2013 at 11:11 AM, K. Shankari <shank...@eecs.berkeley.edu>wrote: > I think that you want the lookup() method in PairRDDFunctions? > > http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions > > It is supposed to be more efficient than filter... > > Shankari > > > > On Thu, Dec 12, 2013 at 7:30 PM, Yadid <ya...@media.mit.edu> wrote: > >> I have a pairRDD and I would like to access a specific key-value. >> The first thing that comes to mind is filtering using the specified key, >> but that seems very inefficient as that would iterate over the entire RDD. >> And even more so if I need to access several keys. >> >> Is there any other way to perform this ? this seems like a really useful >> feature. Im guessing that in order to implement this, I would need a >> mapping of keys to partitions, and a method to access data from a specific >> partition. >> >> Yadid >> >> >