Can you provide a bit more information ? command line for submitting Spark job version of Spark anything interesting from driver / executor logs ?
Thanks On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <sande...@rose-hulman.edu> wrote: > Hey all, > > I am a CS student in the United States working on my senior thesis. > > My thesis uses Spark, and I am encountering some trouble. > > I am using https://github.com/alitouka/spark_dbscan, and to determine > parameters, I am using the utility class they supply, > org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver. > > I am on a 10 node cluster with one machine with 8 cores and 32G of memory > and nine machines with 6 cores and 16G of memory. > > I have 442M of data, which seems like it would be a joke, but the job > stalls at the last stage. > > It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a > number of things for the last couple days, but nothing seems to be helping. > > I have tried: > - Increasing heap sizes and numbers of cores > - More/less executors with different amounts of resources. > - Kyro Serialization > - FAIR Scheduling > > It doesn’t seem like it should require this much. Any ideas? > > - Isaac