I've got the same error. In a running cluster, I kill the supervisor running on 
one of the machines, wait until storm reassigns the topology that was on that 
machine (called Sync), and then bring the supervisor up again. It immediately 
dies, with the following in the log:

2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Synchronizing supervisor
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Worker 
21f86017-fed3-4e94-93f4-7ea65ca983e3 is :timed-out: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id 
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 
32] [1 10] [33 42]}, :port 6703} at supervisor time-secs 1395917412
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Storm code map: 
{"Sync-1-1395916991" "/home/storm/storm/nimbus/stormdist/Sync-1-1395916991", 
"Async-2-1395916991" "/home/storm/storm/nimbus/stormdist/Async-2-1395916991"}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Downloaded storm ids: 
#{"Sync-1-1395916991"}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] All assignment: {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] New assignment: {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Writing new assignment {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Syncing processes
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Assigned executors: {6703 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20] 
[23 32])}}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Allocated: 
{"21f86017-fed3-4e94-93f4-7ea65ca983e3" [:timed-out 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id 
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 
32] [1 10] [33 42]}, :port 6703}]}
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down and clearing state 
for id 21f86017-fed3-4e94-93f4-7ea65ca983e3. Current supervisor time: 
1395917412. State: :timed-out, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id 
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23 
32] [1 10] [33 42]}, :port 6703}
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down 
01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Removing code for storm id 
Sync-1-1395916991
2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path 
/home/storm/storm/supervisor/stormdist/Sync-1-1395916991
2014-03-27 10:50:12 b.s.util [INFO] Error when trying to kill 2160. Process is 
probably already dead.
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path 
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids/2160
2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path 
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/heartbeats
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path 
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path 
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shut down 
01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.util [DEBUG] Making dirs at 
/home/storm/storm/workers/9e27425e-6d2b-48cf-a592-8dfc0204f332/pids
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20] 
[23 32])} for this supervisor 01e5ad81-57a4-4933-87e5-c487e969b4b3 on port 6703 
with id 9e27425e-6d2b-48cf-a592-8dfc0204f332
2014-03-27 10:50:12 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File 
'/home/storm/storm/supervisor/stormdist/Sync-1-1395916991/stormconf.ser' does 
not exist
                at 
org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137) 
~[commons-io-1.4.jar:1.4]
                at 
org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135) 
~[commons-io-1.4.jar:1.4]
                at 
backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:177) 
~[storm-core-0.9.0.1.jar:na]
                at 
backtype.storm.daemon.supervisor$fn__6328.invoke(supervisor.clj:410) 
~[storm-core-0.9.0.1.jar:na]
                at clojure.lang.MultiFn.invoke(MultiFn.java:177) 
~[clojure-1.4.0.jar:na]
                at 
backtype.storm.daemon.supervisor$sync_processes$iter__6219__6223$fn__6224.invoke(supervisor.clj:244)
 ~[storm-core-0.9.0.1.jar:na]
                at clojure.lang.LazySeq.sval(LazySeq.java:42) 
~[clojure-1.4.0.jar:na]
                at clojure.lang.LazySeq.seq(LazySeq.java:60) 
~[clojure-1.4.0.jar:na]
                at clojure.lang.RT.seq(RT.java:473) ~[clojure-1.4.0.jar:na]
                at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.4.0.jar:na]
                at clojure.core$dorun.invoke(core.clj:2725) 
~[clojure-1.4.0.jar:na]
                at clojure.core$doall.invoke(core.clj:2741) 
~[clojure-1.4.0.jar:na]
                at 
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:232) 
~[storm-core-0.9.0.1.jar:na]
                at clojure.lang.AFn.applyToHelper(AFn.java:161) 
[clojure-1.4.0.jar:na]
                at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
                at clojure.core$apply.invoke(core.clj:603) 
~[clojure-1.4.0.jar:na]
                at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) 
~[clojure-1.4.0.jar:na]
                at clojure.lang.RestFn.invoke(RestFn.java:397) 
~[clojure-1.4.0.jar:na]
                at 
backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24) 
~[storm-core-0.9.0.1.jar:na]
                at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
                at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
2014-03-27 10:50:12 b.s.util [INFO] Halting process: ("Error when processing an 
event")

Bringing the supervisor up again after this works, and the supervisor starts 
correctly. Any ideas about what may be causing this?

From: Thuy Nguyen [mailto:thngu...@gmail.com]
Sent: 29 January 2014 02:36
To: user@storm.incubator.apache.org
Subject: failed to start supervisor with missing stormconf.ser

Hi all,

After supervisor was killed, it failed to be restarted because of missing file 
stormconf.ser. The log shows supervisor correctly "Removing code" for the 
topology, but then it tried to "Launching worker with assignment" after 
/storm-local/supervisor/stormdist/ is already cleaned up, thus, it got 
exception about file does not exist. Is it a known bug? Could you please 
provide some insights how this happened? After deleting /storm-local directory, 
supervisor started successfully.


2014-01-28 16:51:02 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-01-28 16:51:02 o.a.z.ZooKeeper [INFO] Initiating client connection, 
connectString=xxx:2181/storm sessionTimeout=20000 
watcher=com.netflix.curator.ConnectionState@252a78ee<mailto:watcher=com.netflix.curator.ConnectionState@252a78ee>
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Opening socket connection to server 
xxx:2181
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Socket connection established to 
xxx:2181, initiating session
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Session establishment complete on 
server xxx:2181, sessionid = 0x143b20bc2587b86, negotiated timeout = 20000
2014-01-28 16:51:02 b.s.d.supervisor [INFO] Starting supervisor with id 
0aa5860a-248b-4814-b201-7b2f40ce701f at host xxx
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Removing code for storm id 
performance_topology-1-1390953444
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down and clearing state 
for id 03ccf938-7276-44f7-ab0e-f2cdfb76d394. Current supervisor time: 
1390956663. State: :timed-out, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390956441, :storm-id 
"performance_topology-1-1390953444", :executors #
{[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [-1 -1] [1 1]}

, :port 6797}
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down 
0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
kill 5817: No such process
2014-01-28 16:51:03 b.s.util [INFO] Error when trying to kill 5817. Process is 
probably already dead.
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shut down 
0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"performance_topology-1-1390953444", :executors ([6 6] [5 5] [7 7] [8 8] [3 3] 
[4 4] [2 2] [1 1])} for this supervisor 0aa5860a-248b-4814-b201-7b2f40ce701f on 
port 6797 with id 7a34f6bf-5494-4ee7-ba54-bbdfbbcbb861
2014-01-28 16:51:03 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File 
'storm-local/supervisor/stormdist/performance_topology-1-1390953444/stormconf.ser'
 does not exist



Thanks,

Thuy

Reply via email to