PS - I don't get this behaviour in all the cases. I did many runs of the same job & i get this behaviour in around 40% of the cases.
Task 4 is the bottom row in the metrics table Thank you, Abhishek e: abshkm...@gmail.com p: 91-8233540996 On Tue, Feb 16, 2016 at 11:19 PM, Abhishek Modi <abshkm...@gmail.com> wrote: > Darren: this is not the last task of the stage. > > Thank you, > Abhishek > > e: abshkm...@gmail.com > p: 91-8233540996 > > > On Tue, Feb 16, 2016 at 6:52 PM, Darren Govoni <dar...@ontrenet.com> > wrote: > >> There were some posts in this group about it. Another person also saw the >> deadlock on next to last or last stage task. >> >> I've attached some images I collected showing this problem. >> >> >> >> <br><br><br>------- Original Message ------- >> On 2/16/2016 07:29 AM Ted Yu wrote:<br>Darren: >> <br>Can you post link to the deadlock issue you mentioned ? >> <br> >> <br>Thanks >> <br> >> <br>> On Feb 16, 2016, at 6:55 AM, Darren Govoni <dar...@ontrenet.com> >> wrote: >> <br>> <br>> I think this is part of the bigger issue of serious deadlock >> conditions occurring in spark many of us have posted on. >> <br>> <br>> Would the task in question be the past task of a stage by >> chance? >> <br>> <br>> <br>> <br>> Sent from my Verizon Wireless 4G LTE smartphone >> <br>> <br>> <br>> -------- Original message -------- >> <br>> From: Abhishek Modi <abshkm...@gmail.com> <br>> Date: 02/16/2016 >> 4:12 AM (GMT-05:00) <br>> To: user@spark.apache.org <br>> Subject: >> Unusually large deserialisation time <br>> <br>> I'm doing a mapPartitions >> on a rdd cached in memory followed by a reduce. Here is my code snippet >> <br>> <br>> // myRdd is an rdd consisting of Tuple2[Int,Long] <br>> >> myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2)) >> <br>> <br>> //The rangify function <br>> def rangify(l: Iterator[ >> Tuple2[Int,Long] ]) : Iterator[ Tuple2[Long, List [ ArrayBuffer[ >> Tuple2[Long,Long] ] ] ] ]= { <br>> var sum=0L <br>> val >> mylist=ArrayBuffer[ Tuple2[Long,Long] ]() <br>> <br>> if(l.isEmpty) >> <br>> return List( (0L,List [ ArrayBuffer[ Tuple2[Long,Long] ] ] >> ())).toIterator <br>> <br>> var prev= -1000L <br>> var begin= -1000L >> <br>> <br>> for (x <- l){ <br>> sum+=x._1 <br>> <br>> if(prev<0){ >> <br>> prev=x._2 <br>> begin=x._2 <br>> } <br>> <br>> >> else if(x._2==prev+1) <br>> prev=x._2 <br>> <br>> else { <br>> >> list+=((begin,prev)) <br>> prev=x._2 <br>> begin=x._2 >> <br>> } <br>> } <br>> <br>> mylist+= ((begin,prev)) <br>> <br>> >> List((sum, List(mylist) ) ).toIterator <br>> } <br>> <br>> <br>> The rdd >> is cached in memory. I'm using 20 executors with 1 core for each executor. >> The cached rdd has 60 blocks. The problem is for every 2-3 runs of the job, >> there is a task which has an abnormally large deserialisation time. >> Screenshot attached <br>> <br>> Thank you, >> <br>> Abhishek >> <br>> <br> > > >