Hi Ted

I am aware that rdd are immutable, but in my use case i need to update same
data set after each iteration.

Following are the points which i was exploring.

1. Generating rdd in each iteration.( It might use a lot of memory).

2. Using Hive tables and update the same table after each iteration.

Please suggest,which one of the methods listed above will be good to use ,
or is there are more better ways to accomplish it.

On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote:

> Please see the doc at the beginning of RDD class:
>
>  * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
> Represents an immutable,
>  * partitioned collection of elements that can be operated on in parallel.
> This class contains the
>  * basic operations available on all RDDs, such as `map`, `filter`, and
> `persist`. In addition,
>
> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com>
> wrote:
>
>> Hi
>>
>> Is there a way i can modify a RDD, in for-each loop,
>>
>> Basically, i have a use case in which i need to perform multiple
>> iteration over data and modify few values in each iteration.
>>
>>
>> Please help.
>>
>
>

Reply via email to