Re: 10hrs of Scheduler Delay

Darren Govoni Fri, 22 Jan 2016 03:48:58 -0800

    
Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.

Sent from my Verizon Wireless 4G LTE smartphone

-------- Original message --------
From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Date: 01/21/2016  11:18 PM  (GMT-05:00) 
To: Ted Yu <yuzhih...@gmail.com> 
Cc: user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 

I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.

- Isaac

On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:

You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:

That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?

org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)

On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:

Hadoop is: HDP 2.3.2.0-2950

Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b

Thanks

On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:

Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:

The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:

Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks 

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:

Hey all,

I am a CS student in the United States working on my senior thesis.

My thesis uses Spark, and I am encountering some trouble.

I am using 
https://github.com/alitouka/spark_dbscan, and to determine parameters, I am 
using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.

I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.

It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.

I have tried:

- Increasing heap sizes and numbers of cores

- More/less executors with different amounts of resources.

- Kyro Serialization

- FAIR Scheduling

It doesn’t seem like it should require this much. Any ideas?

- Isaac

Re: 10hrs of Scheduler Delay

Reply via email to