Hi Egor, It sounds like you should vote for https://spark-project.atlassian.net/browse/SPARK-914 which is to make an RDD iterable from the driver.
On Wed, Feb 12, 2014 at 1:07 AM, Egor Pahomov <pahomov.e...@gmail.com>wrote: > Hello. I've got big RDD(1gb) in yarn cluster. On local machine, which use > this cluster I have only 512 mb. I'd like to iterate over values in result > RDD on my local machine. I can't use collect(), because it would create too > big array locally which more then my heap. I need some iterative way. There > is method iterator(), but it requires some additional information, I can't > provide. ( > http://stackoverflow.com/questions/21698443/best-practice-for-retrieving-big-data-from-rdd-to-local-machine > ) > > -- > > > > *Sincerely yours Egor PakhomovScala Developer, Yandex* >