Re: 10hrs of Scheduler Delay

Renu Yadav Fri, 22 Jan 2016 22:48:42 -0800

If you turn on spark.speculation on then that might help. it worked  for me


On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni <dar...@ontrenet.com> wrote:

> Thanks for the tip. I will try it. But this is the kind of thing spark is
> supposed to figure out and handle. Or at least not get stuck forever.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: Muthu Jayakumar <bablo...@gmail.com>
> Date: 01/22/2016 3:50 PM (GMT-05:00)
> To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" <
> sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> Does increasing the number of partition helps? You could try out something
> 3 times what you currently have.
> Another trick i used was to partition the problem into multiple dataframes
> and run them sequentially and persistent the result and then run a union on
> the results.
>
> Hope this helps.
>
> On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:
>
>> Me too. I had to shrink my dataset to get it to work. For us at least
>> Spark seems to have scaling issues.
>>
>>
>>
>> Sent from my Verizon Wireless 4G LTE smartphone
>>
>>
>> -------- Original message --------
>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
>> Date: 01/21/2016 11:18 PM (GMT-05:00)
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: user@spark.apache.org
>> Subject: Re: 10hrs of Scheduler Delay
>>
>> I have run the driver on a smaller dataset (k=2, n=5000) and it worked
>> quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m,
>> but I am using more resources on this one.
>>
>> - Isaac
>>
>> On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> You may have seen the following on github page:
>>
>> Latest commit 50fdf0e  on Feb 22, 2015
>>
>> That was 11 months ago.
>>
>> Can you search for similar algorithm which runs on Spark and is newer ?
>>
>> If nothing found, consider running the tests coming from the project to
>> determine whether the delay is intrinsic.
>>
>> Cheers
>>
>> On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B <
>> sande...@rose-hulman.edu> wrote:
>>
>>> That thread seems to be moving, it oscillates between a few different
>>> traces… Maybe it is working. It seems odd that it would take that long.
>>>
>>> This is 3rd party code, and after looking at some of it, I think it
>>> might not be as Spark-y as it could be.
>>>
>>> I linked it below. I don’t know a lot about spark, so it might be fine,
>>> but I have my suspicions.
>>>
>>>
>>> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>>>
>>> - Isaac
>>>
>>> On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> You may have noticed the following - did this indicate prolonged
>>> computation in your code ?
>>>
>>> org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
>>> org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
>>> org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
>>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)
>>>
>>>
>>> On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B <
>>> sande...@rose-hulman.edu> wrote:
>>>
>>>> Hadoop is: HDP 2.3.2.0-2950
>>>>
>>>> Here is a gist (pastebin) of my versions en masse and a stacktrace:
>>>> https://gist.github.com/isaacsanders/2e59131758469097651b
>>>>
>>>> Thanks
>>>>
>>>> On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>> Looks like you were running on YARN.
>>>>
>>>> What hadoop version are you using ?
>>>>
>>>> Can you capture a few stack traces of the AppMaster during the delay
>>>> and pastebin them ?
>>>>
>>>> Thanks
>>>>
>>>> On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <
>>>> sande...@rose-hulman.edu> wrote:
>>>>
>>>>> The Spark Version is 1.4.1
>>>>>
>>>>> The logs are full of standard fair, nothing like an exception or even
>>>>> interesting [INFO] lines.
>>>>>
>>>>> Here is the script I am using:
>>>>> https://gist.github.com/isaacsanders/660f480810fbc07d4df2
>>>>>
>>>>> Thanks
>>>>> Isaac
>>>>>
>>>>> On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>
>>>>> Can you provide a bit more information ?
>>>>>
>>>>> command line for submitting Spark job
>>>>> version of Spark
>>>>> anything interesting from driver / executor logs ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <
>>>>> sande...@rose-hulman.edu> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> I am a CS student in the United States working on my senior thesis.
>>>>>>
>>>>>> My thesis uses Spark, and I am encountering some trouble.
>>>>>>
>>>>>> I am using https://github.com/alitouka/spark_dbscan, and to
>>>>>> determine parameters, I am using the utility class they supply,
>>>>>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.
>>>>>>
>>>>>> I am on a 10 node cluster with one machine with 8 cores and 32G of
>>>>>> memory and nine machines with 6 cores and 16G of memory.
>>>>>>
>>>>>> I have 442M of data, which seems like it would be a joke, but the job
>>>>>> stalls at the last stage.
>>>>>>
>>>>>> It was stuck in Scheduler Delay for 10 hours overnight, and I have
>>>>>> tried a number of things for the last couple days, but nothing seems to 
>>>>>> be
>>>>>> helping.
>>>>>>
>>>>>> I have tried:
>>>>>> - Increasing heap sizes and numbers of cores
>>>>>> - More/less executors with different amounts of resources.
>>>>>> - Kyro Serialization
>>>>>> - FAIR Scheduling
>>>>>>
>>>>>> It doesn’t seem like it should require this much. Any ideas?
>>>>>>
>>>>>> - Isaac
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>

Re: 10hrs of Scheduler Delay

Reply via email to