Was that machine also running storm nimbus? Nimbus is currently a sort-of SPOF, but it’s not really; if nimbus goes down the supervisors continue running the topologies as before. There is currently an issue to implement a high-availability nimbus at https://issues.apache.org/jira/browse/STORM-166
From: 이승진 [mailto:sweetest...@navercorp.com] Sent: 04 July 2014 07:17 To: user Subject: Re: storm crashes when one of the zookeeper server dies Actually, it was not just a zookeeper process down, but whole machine dies due to kernel panic, and ping to that server failed accordingly. All servers were connected to each other and in replication mode, of course. -----Original Message----- From: "Irek Khasyanov"<qua...@gmail.com<mailto:qua...@gmail.com>> To: "user"<user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>; "이승진"<sweetest...@navercorp.com<mailto:sweetest...@navercorp.com>>; Cc: Sent: 2014-07-04 (금) 14:51:00 Subject: Re: storm crashes when one of the zookeeper server dies Are you sure all zookeeper servers are in replication mode and can connect each other? Yesterday we added zookeeper cluster and checked what happens with storm when one zookeeper fails. Everything was good, nothing happens with topology. On 4 July 2014 08:26, 이승진 <sweetest...@navercorp.com<mailto:sweetest...@navercorp.com>> wrote: in storm.yaml, we listed 3 zookeeper servers storm.zookeeper.servers: -"host1" -"host2" -"host3" and host1 dies unexpectedly today morning, since then, not only I cannot connect to storm UI (TTransport exception) but also can't execute any of storm command. I was quite worried when this happened, because if it's what storm is supposed to be, one of the zookeeper server can be a SPOF. seems like it should be fixed ASAP Sincerly, -- With best regards, Irek Khasyanov. [http://ack.mail.navercorp.com/readReceipt/notify/?img=rQnmFqKrFxJ4Mx2YaqumK4UwKxvwaAtqFqkoKqFoMovwMokoMq2mFxJSKvIo%2BrkSKAu5W49vpSl51zlqDBFdp6d5MreRhoR9brkZtHFdWXiR7405MXkSMB3TbSlCbzJo1zE5WXiN.gif]