Re: 10hrs of Scheduler Delay

Ted Yu Mon, 25 Jan 2016 05:51:36 -0800

Opening a JIRA is fine. 

See if you can capture stack trace during the hung stage and attach to JIRA so 
that we have more clue.


Thanks

> On Jan 25, 2016, at 4:25 AM, Darren Govoni <dar...@ontrenet.com> wrote:
> 
> Probably we should open a ticket for this.
> There's definitely a deadlock situation occurring in spark under certain 
> conditions.
> 
> The only clue I have is it always happens on the last stage. And it does seem 
> sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if 
> I only run 10mb of it it will succeed. This suggest a serious fundamental 
> scaling problem.
> 
> Workers have plenty of resources.
> 
> 
> 
> Sent from my Verizon Wireless 4G LTE smartphone
> 
> 
> -------- Original message --------
> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
> Date: 01/24/2016 2:54 PM (GMT-05:00) 
> To: Renu Yadav <yren...@gmail.com> 
> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar 
> <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org 
> Subject: Re: 10hrs of Scheduler Delay 
> 
> I am not getting anywhere with any of the suggestions so far. :(
> 
> Trying some more outlets, I will share any solution I find.
> 
> - Isaac
> 
>> On Jan 23, 2016, at 1:48 AM, Renu Yadav <yren...@gmail.com> wrote:
>> 
>> If you turn on spark.speculation on then that might help. it worked  for me
>> 
>>> On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni <dar...@ontrenet.com> wrote:
>>> Thanks for the tip. I will try it. But this is the kind of thing spark is 
>>> supposed to figure out and handle. Or at least not get stuck forever.
>>> 
>>> 
>>> 
>>> Sent from my Verizon Wireless 4G LTE smartphone
>>> 
>>> 
>>> -------- Original message --------
>>> From: Muthu Jayakumar <bablo...@gmail.com> 
>>> Date: 01/22/2016 3:50 PM (GMT-05:00) 
>>> To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
>>> <sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> 
>>> Cc: user@spark.apache.org 
>>> Subject: Re: 10hrs of Scheduler Delay 
>>> 
>>> Does increasing the number of partition helps? You could try out something 
>>> 3 times what you currently have. 
>>> Another trick i used was to partition the problem into multiple dataframes 
>>> and run them sequentially and persistent the result and then run a union on 
>>> the results. 
>>> 
>>> Hope this helps. 
>>> 
>>>> On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:
>>>> Me too. I had to shrink my dataset to get it to work. For us at least 
>>>> Spark seems to have scaling issues.
>>>> 
>>>> 
>>>> 
>>>> Sent from my Verizon Wireless 4G LTE smartphone
>>>> 
>>>> 
>>>> -------- Original message --------
>>>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
>>>> Date: 01/21/2016 11:18 PM (GMT-05:00) 
>>>> To: Ted Yu <yuzhih...@gmail.com> 
>>>> Cc: user@spark.apache.org 
>>>> Subject: Re: 10hrs of Scheduler Delay 
>>>> 
>>>> I have run the driver on a smaller dataset (k=2, n=5000) and it worked 
>>>> quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m, 
>>>> but I am using more resources on this one.
>>>> 
>>>> - Isaac
>>>> 
>>>>> On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>> 
>>>>> You may have seen the following on github page:
>>>>> 
>>>>> Latest commit 50fdf0e  on Feb 22, 2015
>>>>> 
>>>>> That was 11 months ago.
>>>>> 
>>>>> Can you search for similar algorithm which runs on Spark and is newer ?
>>>>> 
>>>>> If nothing found, consider running the tests coming from the project to 
>>>>> determine whether the delay is intrinsic.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>>> On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
>>>>>> <sande...@rose-hulman.edu> wrote:
>>>>>> That thread seems to be moving, it oscillates between a few different 
>>>>>> traces… Maybe it is working. It seems odd that it would take that long.
>>>>>> 
>>>>>> This is 3rd party code, and after looking at some of it, I think it 
>>>>>> might not be as Spark-y as it could be.
>>>>>> 
>>>>>> I linked it below. I don’t know a lot about spark, so it might be fine, 
>>>>>> but I have my suspicions.
>>>>>> 
>>>>>> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>>>>>> 
>>>>>> - Isaac
>>>>>> 
>>>>>>> On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>> 
>>>>>>> You may have noticed the following - did this indicate prolonged 
>>>>>>> computation in your code ?

Re: 10hrs of Scheduler Delay

Reply via email to