Ok is there a way, I can use  hash Partitioning so that I can improve the
performance?


On 17 April 2015 at 19:33, Archit Thakur <archit279tha...@gmail.com> wrote:

> By custom installation, I meant change the code and build it. I have not
> done the complete impact analysis, just had a look on the code.
>
> When you say, same key goes to same node, It would need shuffling unless
> the raw data you are reading is present that way.
> On Apr 17, 2015 6:30 PM, "Jeetendra Gangele" <gangele...@gmail.com> wrote:
>
>> Hi Archit Thanks for reply.
>> How can I don the costom compilation so reduce it to 4 bytes.I want to
>> make it to 4 bytes in any case can you please guide?
>>
>> I am applying flatMapvalue in each step after ZipWithIndex it should be
>> in same Node right? Why its suffling?
>> Also I am running with very less records currently still its shuffling ?
>>
>> regards
>> jeetendra
>>
>>
>>
>> On 17 April 2015 at 15:58, Archit Thakur <archit279tha...@gmail.com>
>> wrote:
>>
>>> I dont think you can change it to 4 bytes without any custom compilation.
>>> To make same key go to same node, you'll have to repartition the data,
>>> which is shuffling anyway. Unless your raw data is such that the same key
>>> is on same node, you'll have to shuffle atleast once to make same key on
>>> same node.
>>>
>>> On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <
>>> gangele...@gmail.com> wrote:
>>>
>>>> Hi All
>>>>
>>>> I have a RDD which has 1 million keys and each key is repeated from
>>>> around 7000 values so total there will be around 1M*7K records in RDD.
>>>>
>>>> and each key is created from ZipWithIndex so key start from 0 to M-1
>>>> the problem with ZipWithIndex is it take long for key which is 8 bytes.
>>>> can I reduce it to 4 bytes?
>>>>
>>>> Now how Can I make sure the record with same key will go the same node
>>>> so that I can avoid shuffling. Also how default partition-er will work 
>>>> here.
>>>>
>>>> Regards
>>>> jeetendra
>>>>
>>>>
>>>
>>
>>
>>

Reply via email to