Thomas Huang created SPARK-16190:
------------------------------------

             Summary: Worker registration failed: Duplicate worker ID
                 Key: SPARK-16190
                 URL: https://issues.apache.org/jira/browse/SPARK-16190
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.6.1
            Reporter: Thomas Huang
            Priority: Critical


Several worker crashed simultaneously due to this error: 
Worker registration failed: Duplicate worker ID

This is the worker log on one of those crashed workers:
16/06/24 16:28:53 INFO ExecutorRunner: Killing process!
16/06/24 16:28:53 INFO ExecutorRunner: Runner thread for executor 
app-20160624003013-0442/26 interrupted
16/06/24 16:28:53 INFO ExecutorRunner: Killing process!
16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process: 
java.lang.UNIXProcess@31340137. This process will likely be orphaned.
16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process: 
java.lang.UNIXProcess@4d3bdb1d. This process will likely be orphaned.
16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/8 finished with 
state KILLED
16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/26 finished 
with state KILLED
16/06/24 16:29:03 INFO Worker: Cleaning up local directories for application 
app-20160624003013-0442
16/06/24 16:31:18 INFO ExternalShuffleBlockResolver: Application 
app-20160624003013-0442 removed, cleanupLocalDirs = true
16/06/24 16:31:18 INFO Worker: Asked to launch executor 
app-20160624162905-0469/14 for SparkStreamingLRScala
16/06/24 16:31:18 INFO SecurityManager: Changing view acls to: mqq
16/06/24 16:31:18 INFO SecurityManager: Changing modify acls to: mqq
16/06/24 16:31:18 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(mqq); users with 
modify permissions: Set(mqq)
16/06/24 16:31:18 INFO ExecutorRunner: Launch command: 
"/data/jdk1.7.0_60/bin/java" "-cp" 
"/data/spark-1.6.1-bin-cdh4/conf/:/data/spark-1.6.1-bin-cdh4/lib/spark-assembly-1.6.1-hadoop2.3.0.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar"
 "-Xms10240M" "-Xmx10240M" "-Dspark.driver.port=34792" "-XX:MaxPermSize=256m" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"spark://CoarseGrainedScheduler@100.65.21.199:34792" "--executor-id" "14" 
"--hostname" "100.65.21.223" "--cores" "5" "--app-id" "app-20160624162905-0469" 
"--worker-url" "spark://Worker@100.65.21.223:46581"
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Connecting to master 100.65.21.199:7077...
16/06/24 16:31:18 INFO Worker: Successfully registered with master 
spark://100.65.21.199:7077
16/06/24 16:31:18 INFO Worker: Worker cleanup enabled; old application 
directories will be deleted in: /data/spark-1.6.1-bin-cdh4/work
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Connecting to master 100.65.21.199:7077...
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077 
requested this worker to reconnect.
16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with 
the master, since there is an attempt scheduled already.
16/06/24 16:31:18 INFO Worker: Asked to launch executor 
app-20160624153031-0467/27 for SparkRealtimeRecommender
16/06/24 16:31:18 INFO SecurityManager: Changing view acls to: mqq
16/06/24 16:31:18 INFO SecurityManager: Changing modify acls to: mqq
16/06/24 16:31:18 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(mqq); users with 
modify permissions: Set(mqq)
16/06/24 16:31:18 INFO ExecutorRunner: Launch command: 
"/data/jdk1.7.0_60/bin/java" "-cp" 
"/data/spark-1.6.1-bin-cdh4/conf/:/data/spark-1.6.1-bin-cdh4/lib/spark-assembly-1.6.1-hadoop2.3.0.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar"
 "-Xms61440M" "-Xmx61440M" "-Dspark.driver.port=50193" "-XX:MaxPermSize=256m" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"spark://CoarseGrainedScheduler@100.65.21.199:50193" "--executor-id" "27" 
"--hostname" "100.65.21.223" "--cores" "46" "--app-id" 
"app-20160624153031-0467" "--worker-url" "spark://Worker@100.65.21.223:46581"
16/06/24 16:31:18 ERROR Worker: Worker registration failed: Duplicate worker ID
16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
16/06/24 16:31:18 INFO ExecutorRunner: Killing process!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to