I've experienced this same problem. Always the last stage hangs. Indeterminant. No errors in logs. I run spark 1.5.2. Can't find an explanation. But it's definitely a showstopper.
Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: Ted Yu <yuzhih...@gmail.com> Date: 01/21/2016 7:44 PM (GMT-05:00) To: "Sanders, Isaac B" <sande...@rose-hulman.edu> Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Looks like you were running on YARN. What hadoop version are you using ? Can you capture a few stack traces of the AppMaster during the delay and pastebin them ? Thanks On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <sande...@rose-hulman.edu> wrote: The Spark Version is 1.4.1 The logs are full of standard fair, nothing like an exception or even interesting [INFO] lines. Here is the script I am using: https://gist.github.com/isaacsanders/660f480810fbc07d4df2 Thanks Isaac On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote: Can you provide a bit more information ? command line for submitting Spark job version of Spark anything interesting from driver / executor logs ? Thanks On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <sande...@rose-hulman.edu> wrote: Hey all, I am a CS student in the United States working on my senior thesis. My thesis uses Spark, and I am encountering some trouble. I am using https://github.com/alitouka/spark_dbscan, and to determine parameters, I am using the utility class they supply, org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver. I am on a 10 node cluster with one machine with 8 cores and 32G of memory and nine machines with 6 cores and 16G of memory. I have 442M of data, which seems like it would be a joke, but the job stalls at the last stage. It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a number of things for the last couple days, but nothing seems to be helping. I have tried: - Increasing heap sizes and numbers of cores - More/less executors with different amounts of resources. - Kyro Serialization - FAIR Scheduling It doesn’t seem like it should require this much. Any ideas? - Isaac