Hi
Can someone throw light on this. The issue is not frquently happening.
Sometimes the job halts with the above messages.

Regards,
Padma Ch

On Fri, May 27, 2016 at 8:47 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Priya:
> Have you checked the executor logs on hostname1 and hostname2 ?
>
> Cheers
>
> On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> If you get stuck in job fails, one of best practices is to increase
>> #partitions.
>> Also, you'd better off using DataFrame instread of RDD in terms of join
>> optimization.
>>
>> // maropu
>>
>>
>> On Thu, May 26, 2016 at 11:40 PM, Priya Ch <learnings.chitt...@gmail.com>
>> wrote:
>>
>>> Hello Team,
>>>
>>>
>>>  I am trying to perform join 2 rdds where one is of size 800 MB and the
>>> other is 190 MB. During the join step, my job halts and I don't see
>>> progress in the execution.
>>>
>>> This is the message I see on console -
>>>
>>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>>> locations for shuffle 0 to <hostname1>:40000
>>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>>> locations for shuffle 1 to <hostname2>:40000
>>>
>>> After these messages, I dont see any progress. I am using Spark 1.6.0
>>> version and yarn scheduler (running in YARN client mode). My cluster
>>> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
>>> 1 TB hard disk space, 300GB memory and 32 cores.
>>>
>>> HDFS block size is 128 MB.
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>

Reply via email to