Right, if your RDD has a Partitioner, then lookup() will use that to
determine which partition contains the key that you want to lookup and only
run a task on that partition.

That still doesn't efficiently solve the lookup-a-set-of-keys problem, but
extending lookup() to efficiently handle a Set[K] is pretty
straightforward, if someone wants to tackle it.


On Fri, Dec 13, 2013 at 11:11 AM, K. Shankari <shank...@eecs.berkeley.edu>wrote:

> I think that you want the lookup() method in PairRDDFunctions?
>
> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
>
> It is supposed to be more efficient than filter...
>
> Shankari
>
>
>
> On Thu, Dec 12, 2013 at 7:30 PM, Yadid <ya...@media.mit.edu> wrote:
>
>> I have a pairRDD and I would like to access a specific key-value.
>> The first thing that comes to mind is filtering using the specified key,
>> but that seems very inefficient as that would iterate over the entire RDD.
>> And even more so if I need to access several keys.
>>
>> Is there any other way to perform this ? this seems like a really useful
>> feature. Im guessing that in order to implement this, I would need a
>> mapping of keys to partitions, and a method to access data from a specific
>> partition.
>>
>> Yadid
>>
>>
>

Reply via email to