Fault Tolerance details

Petr Janeček Tue, 25 Jul 2017 07:23:39 -0700

Hello gentlemen,
I find the Fault Tolerance documentation slightly … incomplete.


I read both http://storm.apache.org/releases/1.1.0/Daemon-Fault-Tolerance.
html and http://storm.apache.org/releases/1.1.0/Guaranteeing-message-
processing.html, and these questions don't seem to be mentioned much:

1. A surprising thing is that if a bolt throws, the exception tears down the
whole worker. Are there any plans for changing this so that only the bolt in
question is restarted? (As an explicit configuration option, simply
loggignoring the exception and rolling on might also be a viable strategy 
for some, too...)


2. If a node dies, its tasks are reassigned to other nodes, this works fine.
But what if the node comes back up? Currently I believe Storm does not
rebalance the topology automatically, and it must be triggered manually.

We’re not using stateful bolts nor message acking as it is okay for us to
drop messages occasionally. Does Storm recover tasks to a revived worker 
when the topology is stateful and message acking is enabled? Because in that
case the rebalance seems to be a safe operation… If not, is this somewhere
on the roadmap?




Thank you for any clarifications, once all this is crystal clear to me I 
intend to enhance the documentation.


Petr Janeček

Fault Tolerance details

Reply via email to