Yes, I do that. But if I go to my worker node and check for the list it has
printed



MyRdd.flatmap(func(_))
MyRdd.saveAsTextFile(..)

func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = {

//

*println(list)*
list
}



The records differ( only count match).


On Thu, Jan 30, 2014 at 11:48 PM, Evan R. Sparks <evan.spa...@gmail.com>wrote:

> Actually - looking at your use case, you may simply be saving the original
> RDD
> Doing something like:
> val newRdd = MyRdd.flatMap(func)
> newRdd.saveAsTextFile(...)
>
> May solve your issue.
>
>
> On Thu, Jan 30, 2014 at 10:17 AM, Evan R. Sparks <evan.spa...@gmail.com>wrote:
>
>> Could it be that you have the same records that you get back from
>> flatMap, just in a different order?
>>
>>
>> On Thu, Jan 30, 2014 at 1:05 AM, Archit Thakur <archit279tha...@gmail.com
>> > wrote:
>>
>>> Needless to say, it works fine with int/string(primitive) type.
>>>
>>>
>>> On Wed, Jan 29, 2014 at 2:04 PM, Archit Thakur <
>>> archit279tha...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am facing a general problem with flatmap operation on rdd.
>>>>
>>>> I am doing
>>>>
>>>> MyRdd.flatmap(func(_))
>>>> MyRdd.saveAsTextFile(..)
>>>>
>>>> func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = {
>>>>
>>>> //
>>>>
>>>> println(list)
>>>> list
>>>> }
>>>>
>>>> now if I check the list from the logs at worker and check the textfile
>>>> it has created, it differs.
>>>>
>>>> Only the no. of records are same, but the actual records in the file
>>>> differs from one in the logs.
>>>>
>>>> Does Spark modifies keys/values in between? What other operations does
>>>> it perform with Key or Value?
>>>>
>>>> Thanks and Regards,
>>>> Archit Thakur.
>>>>
>>>>
>>>
>>
>

Reply via email to