Re: indexing an RDD [Python]

Sven Krasser Fri, 24 Apr 2015 14:57:19 -0700

The solution depends largely on your use case. I assume the index is in the
key. In that case, you can make a second RDD out of the list of indices and
then use cogroup() on both.

If the list of indices is small, just using filter() will work well.

If you need to read back a few select values to the driver, take a look at
lookup().

On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
wrote:

> I have an RDD of LabledPoints.
> Is it possible to select a subset of it based on a list of indeces?
> For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD
> with elements 0,4,5,6 and 8.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-- 
www.skrasser.com <http://www.skrasser.com/?utm_source=sig>

Re: indexing an RDD [Python]

Reply via email to