Hi Ted I am aware that rdd are immutable, but in my use case i need to update same data set after each iteration.
Following are the points which i was exploring. 1. Generating rdd in each iteration.( It might use a lot of memory). 2. Using Hive tables and update the same table after each iteration. Please suggest,which one of the methods listed above will be good to use , or is there are more better ways to accomplish it. On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote: > Please see the doc at the beginning of RDD class: > > * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. > Represents an immutable, > * partitioned collection of elements that can be operated on in parallel. > This class contains the > * basic operations available on all RDDs, such as `map`, `filter`, and > `persist`. In addition, > > On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com> > wrote: > >> Hi >> >> Is there a way i can modify a RDD, in for-each loop, >> >> Basically, i have a use case in which i need to perform multiple >> iteration over data and modify few values in each iteration. >> >> >> Please help. >> > >