Pete Robbins created SPARK-15606: ------------------------------------ Summary: Driver hang in o.a.s.DistributedSuite on 2 core machine Key: SPARK-15606 URL: https://issues.apache.org/jira/browse/SPARK-15606 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Environment: AMD64 box with only 2 cores Reporter: Pete Robbins
repeatedly failing task that crashes JVM *** FAILED *** The code passed to failAfter did not complete within 100000 milliseconds. (DistributedSuite.scala:128) This test started failing and DistrbutedSuite hanging following https://github.com/apache/spark/pull/13055 It looks like the extra message to remove the BlockManager deadlocks as there are only 2 message processing loop threads. Related to https://issues.apache.org/jira/browse/SPARK-13906 {code} /** Thread pool used for dispatching messages. */ private val threadpool: ThreadPoolExecutor = { val numThreads = nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads", math.max(2, Runtime.getRuntime.availableProcessors())) val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "dispatcher-event-loop") for (i <- 0 until numThreads) { pool.execute(new MessageLoop) } pool } {code} Setting a minimum of 3 threads alleviates this issue but I'm not sure there isn't another underlying problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org