PS - I don't get this behaviour in all the cases. I did many runs of the
same job & i get this behaviour in around 40% of the cases.

Task 4 is the bottom row in the metrics table

Thank you,
Abhishek

e: abshkm...@gmail.com
p: 91-8233540996


On Tue, Feb 16, 2016 at 11:19 PM, Abhishek Modi <abshkm...@gmail.com> wrote:

> Darren: this is not the last task of the stage.
>
> Thank you,
> Abhishek
>
> e: abshkm...@gmail.com
> p: 91-8233540996
>
>
> On Tue, Feb 16, 2016 at 6:52 PM, Darren Govoni <dar...@ontrenet.com>
> wrote:
>
>> There were some posts in this group about it. Another person also saw the
>> deadlock on next to last or last stage task.
>>
>> I've attached some images I collected showing this problem.
>>
>>
>>
>> <br><br><br>------- Original Message -------
>> On 2/16/2016  07:29 AM Ted Yu wrote:<br>Darren:
>> <br>Can you post link to the deadlock issue you mentioned ?
>> <br>
>> <br>Thanks
>> <br>
>> <br>> On Feb 16, 2016, at 6:55 AM, Darren Govoni <dar...@ontrenet.com>
>> wrote:
>> <br>> <br>> I think this is part of the bigger issue of serious deadlock
>> conditions occurring in spark many of us have posted on.
>> <br>> <br>> Would the task in question be the past task of a stage by
>> chance?
>> <br>> <br>> <br>> <br>> Sent from my Verizon Wireless 4G LTE smartphone
>> <br>> <br>> <br>> -------- Original message --------
>> <br>> From: Abhishek Modi <abshkm...@gmail.com> <br>> Date: 02/16/2016
>> 4:12 AM (GMT-05:00) <br>> To: user@spark.apache.org <br>> Subject:
>> Unusually large deserialisation time <br>> <br>> I'm doing a mapPartitions
>> on a rdd cached in memory followed by a reduce. Here is my code snippet
>> <br>> <br>> // myRdd is an rdd consisting of Tuple2[Int,Long] <br>>
>> myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2))
>> <br>> <br>> //The rangify function <br>> def rangify(l: Iterator[
>> Tuple2[Int,Long] ]) : Iterator[ Tuple2[Long, List [ ArrayBuffer[
>> Tuple2[Long,Long] ] ] ] ]= { <br>>   var sum=0L <br>>   val
>> mylist=ArrayBuffer[ Tuple2[Long,Long] ]() <br>> <br>>   if(l.isEmpty)
>> <br>>     return List( (0L,List [ ArrayBuffer[ Tuple2[Long,Long] ] ]
>> ())).toIterator <br>> <br>>   var prev= -1000L <br>>   var begin= -1000L
>> <br>> <br>>   for (x <- l){ <br>>     sum+=x._1 <br>> <br>>     if(prev<0){
>> <br>>       prev=x._2 <br>>       begin=x._2 <br>>     } <br>> <br>>
>>  else if(x._2==prev+1) <br>>       prev=x._2 <br>> <br>>     else { <br>>
>>      list+=((begin,prev)) <br>>       prev=x._2 <br>>       begin=x._2
>> <br>>     } <br>>   } <br>> <br>>   mylist+= ((begin,prev)) <br>> <br>>
>>  List((sum, List(mylist) ) ).toIterator <br>> } <br>> <br>> <br>> The rdd
>> is cached in memory. I'm using 20 executors with 1 core for each executor.
>> The cached rdd has 60 blocks. The problem is for every 2-3 runs of the job,
>> there is a task which has an abnormally large deserialisation time.
>> Screenshot attached <br>> <br>> Thank you,
>> <br>> Abhishek
>> <br>> <br>
>
>
>

Reply via email to