hi, There're some defunct workers in my storm cluster(version:0.9.5): deploy 1634 1 0 2015 ? 07:11:45 [java] <defunct> deploy 5607 1 2 Mar25 ? 23:59:26 [java] <defunct> deploy 9154 1 2 Jan13 ? 3-05:31:28 [java] <defunct> deploy 14292 1 4 Mar11 ? 2-20:59:31 [java] <defunct>
And these dead java process still hold the worker ports, let's take the 5607 process as the example: $ lsof -i TCP:6704 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 5607 deploy 71u IPv4 659563503 0t0 TCP *:6704 (LISTEN) A thread of the defunct process is still alive: $ ps -efL | grep 5607 deploy 1630 20886 1630 0 1 10:26 pts/1 00:00:00 grep 5607 deploy 5607 1 5607 0 2 Mar25 ? 00:00:00 [java] <defunct> deploy 5607 1 5974 0 2 Mar25 ? 01:37:32 [java] So when new assignment is coming, new worker creating will fail: 2016-05-06T11:27:04.143+0800 b.s.d.worker [INFO] Reading Assignments. 2016-05-06T11:27:04.202+0800 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context 2016-05-06T11:27:04.394+0800 b.s.d.worker [INFO] Launching receive-thread for 3278773a-4bca-4a53-a845-3668dfe089ee:6704 2016-05-06T11:27:04.409+0800 b.s.m.n.Server [INFO] Create Netty Server Netty-server-localhost-6704, buffer_size: 5242880, maxWorkers: 1 2016-05-06T11:27:04.449+0800 b.s.d.worker [ERROR] Error on initialization of server mk-worker org.apache.storm.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:6704 at org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.messaging.netty.Server.<init>(Server.java:130) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.messaging.netty.Context.bind(Context.java:75) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68) ~[storm-core-0.9.5.jar:0.9.5] at clojure.lang.RestFn.invoke(RestFn.java:668) [clojure-1.5.1.jar:na] at backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:378) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.daemon.worker$fn__6959$exec_fn__1103__auto____6960.invoke(worker.clj:413) ~[storm-core-0.9.5.jar:0.9.5] at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na] at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.worker$fn__6959$mk_worker__7015.doInvoke(worker.clj:391) [storm-core-0.9.5.jar:0.9.5] at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na] at backtype.storm.daemon.worker$_main.invoke(worker.clj:502) [storm-core-0.9.5.jar:0.9.5] at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na] at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] at backtype.storm.daemon.worker.main(Unknown Source) [storm-core-0.9.5.jar:0.9.5] java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) ~[na:1.6.0_35] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) ~[na:1.6.0_35] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) ~[na:1.6.0_35] at org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) ~[storm-core-0.9.5.jar:0.9.5] at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) ~[storm-core-0.9.5.jar:0.9.5] at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) ~[storm-core-0.9.5.jar:0.9.5] at org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[storm-core-0.9.5.jar:0.9.5] at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[storm-core-0.9.5.jar:0.9.5] at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[storm-core-0.9.5.jar:0.9.5] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ~[na:1.6.0_35] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ~[na:1.6.0_35] at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_35] 2016-05-06T11:27:04.471+0800 b.s.util [ERROR] Halting process: ("Error on initialization") My question is : 1) Why these defunct workers still hold the port? 2) How to release the ports hold by defunct workers? Thank you.