I've got the same error. In a running cluster, I kill the supervisor running on one of the machines, wait until storm reassigns the topology that was on that machine (called Sync), and then bring the supervisor up again. It immediately dies, with the following in the log:
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Synchronizing supervisor 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Worker 21f86017-fed3-4e94-93f4-7ea65ca983e3 is :timed-out: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 32] [1 10] [33 42]}, :port 6703} at supervisor time-secs 1395917412 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Storm code map: {"Sync-1-1395916991" "/home/storm/storm/nimbus/stormdist/Sync-1-1395916991", "Async-2-1395916991" "/home/storm/storm/nimbus/stormdist/Async-2-1395916991"} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Downloaded storm ids: #{"Sync-1-1395916991"} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] All assignment: {} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] New assignment: {} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Writing new assignment {} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Syncing processes 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Assigned executors: {6703 #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20] [23 32])}} 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Allocated: {"21f86017-fed3-4e94-93f4-7ea65ca983e3" [:timed-out #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 32] [1 10] [33 42]}, :port 6703}]} 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down and clearing state for id 21f86017-fed3-4e94-93f4-7ea65ca983e3. Current supervisor time: 1395917412. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 32] [1 10] [33 42]}, :port 6703} 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down 01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Removing code for storm id Sync-1-1395916991 2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path /home/storm/storm/supervisor/stormdist/Sync-1-1395916991 2014-03-27 10:50:12 b.s.util [INFO] Error when trying to kill 2160. Process is probably already dead. 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids/2160 2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/heartbeats 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shut down 01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3 2014-03-27 10:50:12 b.s.util [DEBUG] Making dirs at /home/storm/storm/workers/9e27425e-6d2b-48cf-a592-8dfc0204f332/pids 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20] [23 32])} for this supervisor 01e5ad81-57a4-4933-87e5-c487e969b4b3 on port 6703 with id 9e27425e-6d2b-48cf-a592-8dfc0204f332 2014-03-27 10:50:12 b.s.event [ERROR] Error when processing event java.io.FileNotFoundException: File '/home/storm/storm/supervisor/stormdist/Sync-1-1395916991/stormconf.ser' does not exist at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137) ~[commons-io-1.4.jar:1.4] at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135) ~[commons-io-1.4.jar:1.4] at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:177) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.daemon.supervisor$fn__6328.invoke(supervisor.clj:410) ~[storm-core-0.9.0.1.jar:na] at clojure.lang.MultiFn.invoke(MultiFn.java:177) ~[clojure-1.4.0.jar:na] at backtype.storm.daemon.supervisor$sync_processes$iter__6219__6223$fn__6224.invoke(supervisor.clj:244) ~[storm-core-0.9.0.1.jar:na] at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na] at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na] at clojure.lang.RT.seq(RT.java:473) ~[clojure-1.4.0.jar:na] at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.4.0.jar:na] at clojure.core$dorun.invoke(core.clj:2725) ~[clojure-1.4.0.jar:na] at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na] at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:232) ~[storm-core-0.9.0.1.jar:na] at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na] at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na] at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na] at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na] at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na] at backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24) ~[storm-core-0.9.0.1.jar:na] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] 2014-03-27 10:50:12 b.s.util [INFO] Halting process: ("Error when processing an event") Bringing the supervisor up again after this works, and the supervisor starts correctly. Any ideas about what may be causing this? From: Thuy Nguyen [mailto:thngu...@gmail.com] Sent: 29 January 2014 02:36 To: user@storm.incubator.apache.org Subject: failed to start supervisor with missing stormconf.ser Hi all, After supervisor was killed, it failed to be restarted because of missing file stormconf.ser. The log shows supervisor correctly "Removing code" for the topology, but then it tried to "Launching worker with assignment" after /storm-local/supervisor/stormdist/ is already cleaned up, thus, it got exception about file does not exist. Is it a known bug? Could you please provide some insights how this happened? After deleting /storm-local directory, supervisor started successfully. 2014-01-28 16:51:02 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-01-28 16:51:02 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=xxx:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@252a78ee<mailto:watcher=com.netflix.curator.ConnectionState@252a78ee> 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Opening socket connection to server xxx:2181 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Socket connection established to xxx:2181, initiating session 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Session establishment complete on server xxx:2181, sessionid = 0x143b20bc2587b86, negotiated timeout = 20000 2014-01-28 16:51:02 b.s.d.supervisor [INFO] Starting supervisor with id 0aa5860a-248b-4814-b201-7b2f40ce701f at host xxx 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Removing code for storm id performance_topology-1-1390953444 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down and clearing state for id 03ccf938-7276-44f7-ab0e-f2cdfb76d394. Current supervisor time: 1390956663. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390956441, :storm-id "performance_topology-1-1390953444", :executors # {[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [-1 -1] [1 1]} , :port 6797} 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down 0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394 kill 5817: No such process 2014-01-28 16:51:03 b.s.util [INFO] Error when trying to kill 5817. Process is probably already dead. 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shut down 0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "performance_topology-1-1390953444", :executors ([6 6] [5 5] [7 7] [8 8] [3 3] [4 4] [2 2] [1 1])} for this supervisor 0aa5860a-248b-4814-b201-7b2f40ce701f on port 6797 with id 7a34f6bf-5494-4ee7-ba54-bbdfbbcbb861 2014-01-28 16:51:03 b.s.event [ERROR] Error when processing event java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/performance_topology-1-1390953444/stormconf.ser' does not exist Thanks, Thuy