Re: Best practice for retrieving big data from RDD to local machine

Andrew Ash Wed, 12 Feb 2014 01:16:17 -0800

Hi Egor,

It sounds like you should vote for
https://spark-project.atlassian.net/browse/SPARK-914 which is to make an
RDD iterable from the driver.



On Wed, Feb 12, 2014 at 1:07 AM, Egor Pahomov <pahomov.e...@gmail.com>wrote:

> Hello. I've got big RDD(1gb) in yarn cluster. On local machine, which use
> this cluster I have only 512 mb. I'd like to iterate over values in result
> RDD on my local machine. I can't use collect(), because it would create too
> big array locally which more then my heap. I need some iterative way. There
> is method iterator(), but it requires some additional information, I can't
> provide. (
> http://stackoverflow.com/questions/21698443/best-practice-for-retrieving-big-data-from-rdd-to-local-machine
> )
>
> --
>
>
>
> *Sincerely yours Egor PakhomovScala Developer, Yandex*
>

Re: Best practice for retrieving big data from RDD to local machine

Reply via email to