You may have noticed the following - did this indicate prolonged computation in your code ?
org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205) org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34) org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15) org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16) On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B <sande...@rose-hulman.edu> wrote: > Hadoop is: HDP 2.3.2.0-2950 > > Here is a gist (pastebin) of my versions en masse and a stacktrace: > https://gist.github.com/isaacsanders/2e59131758469097651b > > Thanks > > On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > Looks like you were running on YARN. > > What hadoop version are you using ? > > Can you capture a few stack traces of the AppMaster during the delay and > pastebin them ? > > Thanks > > On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B < > sande...@rose-hulman.edu> wrote: > >> The Spark Version is 1.4.1 >> >> The logs are full of standard fair, nothing like an exception or even >> interesting [INFO] lines. >> >> Here is the script I am using: >> https://gist.github.com/isaacsanders/660f480810fbc07d4df2 >> >> Thanks >> Isaac >> >> On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Can you provide a bit more information ? >> >> command line for submitting Spark job >> version of Spark >> anything interesting from driver / executor logs ? >> >> Thanks >> >> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B < >> sande...@rose-hulman.edu> wrote: >> >>> Hey all, >>> >>> I am a CS student in the United States working on my senior thesis. >>> >>> My thesis uses Spark, and I am encountering some trouble. >>> >>> I am using https://github.com/alitouka/spark_dbscan, and to determine >>> parameters, I am using the utility class they supply, >>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver. >>> >>> I am on a 10 node cluster with one machine with 8 cores and 32G of >>> memory and nine machines with 6 cores and 16G of memory. >>> >>> I have 442M of data, which seems like it would be a joke, but the job >>> stalls at the last stage. >>> >>> It was stuck in Scheduler Delay for 10 hours overnight, and I have tried >>> a number of things for the last couple days, but nothing seems to be >>> helping. >>> >>> I have tried: >>> - Increasing heap sizes and numbers of cores >>> - More/less executors with different amounts of resources. >>> - Kyro Serialization >>> - FAIR Scheduling >>> >>> It doesn’t seem like it should require this much. Any ideas? >>> >>> - Isaac >> >> >> >> > >