Hi Can someone throw light on this. The issue is not frquently happening. Sometimes the job halts with the above messages.
Regards, Padma Ch On Fri, May 27, 2016 at 8:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Priya: > Have you checked the executor logs on hostname1 and hostname2 ? > > Cheers > > On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> If you get stuck in job fails, one of best practices is to increase >> #partitions. >> Also, you'd better off using DataFrame instread of RDD in terms of join >> optimization. >> >> // maropu >> >> >> On Thu, May 26, 2016 at 11:40 PM, Priya Ch <learnings.chitt...@gmail.com> >> wrote: >> >>> Hello Team, >>> >>> >>> I am trying to perform join 2 rdds where one is of size 800 MB and the >>> other is 190 MB. During the join step, my job halts and I don't see >>> progress in the execution. >>> >>> This is the message I see on console - >>> >>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output >>> locations for shuffle 0 to <hostname1>:40000 >>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output >>> locations for shuffle 1 to <hostname2>:40000 >>> >>> After these messages, I dont see any progress. I am using Spark 1.6.0 >>> version and yarn scheduler (running in YARN client mode). My cluster >>> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has >>> 1 TB hard disk space, 300GB memory and 32 cores. >>> >>> HDFS block size is 128 MB. >>> >>> Thanks, >>> Padma Ch >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> > >