So we had a zookeeper outage the other day, that somehow ended up causing
Storm to delete all its topologies.  I'm looking to see if this is
something anyone else has experienced, and whether or not a Storm upgrade
might address some of my concerns.

Here is what I've figured out so far:

Storm 0.10 version - two worker nodes, run runs nimbus
Kafka 0.8.2.1 - 3 nodes
Zookeeper 3.4.5 - 3 nodes

Zookeeper and kafka clusters crashed, Storm jobs went into a whirl wind
failing, leaving turds in /tmp filling up disk.
Woke up in morning all topology jars missing, nowhere to be found.
Look at storm data in zookeeper, looks like everything is missing there.
Try to republish a job, nimbus picks it up starts it then decides job
shouldn't be here and kills it
Cleanout zookeeper data - no change
Cleanout localstate data - no change
shutdown storm node2, clean out localstate on node1 and zookeeper data
restart storm node1
success!

So I think the localstate also got corrupted.  I'm not sure who exactly got
corrupted first, but it appears Storm started trusting the wrong source for
truth and decided all the jobs shouldn't be there.

So anyone else ever run into this, thoughts ?

Reply via email to