Yes, I do that. But if I go to my worker node and check for the list it has printed
MyRdd.flatmap(func(_)) MyRdd.saveAsTextFile(..) func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = { // *println(list)* list } The records differ( only count match). On Thu, Jan 30, 2014 at 11:48 PM, Evan R. Sparks <evan.spa...@gmail.com>wrote: > Actually - looking at your use case, you may simply be saving the original > RDD > Doing something like: > val newRdd = MyRdd.flatMap(func) > newRdd.saveAsTextFile(...) > > May solve your issue. > > > On Thu, Jan 30, 2014 at 10:17 AM, Evan R. Sparks <evan.spa...@gmail.com>wrote: > >> Could it be that you have the same records that you get back from >> flatMap, just in a different order? >> >> >> On Thu, Jan 30, 2014 at 1:05 AM, Archit Thakur <archit279tha...@gmail.com >> > wrote: >> >>> Needless to say, it works fine with int/string(primitive) type. >>> >>> >>> On Wed, Jan 29, 2014 at 2:04 PM, Archit Thakur < >>> archit279tha...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am facing a general problem with flatmap operation on rdd. >>>> >>>> I am doing >>>> >>>> MyRdd.flatmap(func(_)) >>>> MyRdd.saveAsTextFile(..) >>>> >>>> func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = { >>>> >>>> // >>>> >>>> println(list) >>>> list >>>> } >>>> >>>> now if I check the list from the logs at worker and check the textfile >>>> it has created, it differs. >>>> >>>> Only the no. of records are same, but the actual records in the file >>>> differs from one in the logs. >>>> >>>> Does Spark modifies keys/values in between? What other operations does >>>> it perform with Key or Value? >>>> >>>> Thanks and Regards, >>>> Archit Thakur. >>>> >>>> >>> >> >