Re: Storm worker crash scenario

Harsha Fri, 08 Jan 2016 14:32:52 -0800

Ganesh,              In Storm , nimbus is responsible for your topology
resource allocation and also workers send heartbeats to nimbus. If a
heartbeat is missed and worker is dead nimbus tries to allocate other
resources to your topology. In this case if your nimbus is alive and
worker is dead , it would be rebalanced to run other alive worker.
In case of 1 spout instance and if the worker is dead where its running
,nimbus will re-locate this to other alive worker. But since you have a
nimbus daemon down this rebalance didn't happen. We do have nimbus HA
where you can run multiple nimbuses to avoid this issue. This feature is
will be part of upcoming 1.0 release.


Thanks, Harsha


On Fri, Jan 8, 2016, at 02:20 PM, Ganesh Chandrasekaran wrote:
> When Nimbus went down, other topologies were still processing messages
> correctly. It’s only because when 1 half of my topology went down it
> stopped processing
 messages for that particular topology.


>


> Actually now that I said that - I am using 1 spout for 2 workers.
> Maybe the worker which went down had the spout and that’s why Storm
> wasn’t processing messages.


> I am going to try having Spouts=Number of workers of topology. Maybe
> that will fix this issue.


>


> Thanks,


> Ganesh


> *From:* Annabel Melongo [mailto:melongo_anna...@yahoo.com]
>
> *Sent:* Friday, January 08, 2016 5:16 PM *To:* user@storm.apache.org
> *Subject:* Re: Storm worker crash scenario

>


> Ganesh,


>


> Nimbus is a sort of JobTracker. It makes sense that the job resumes
> only after Nimbus started working correctly. Otherwise, the state of
> the running job
 would have been lost.


>


> Thanks


>


> On Friday, January 8, 2016 1:07 PM, Ganesh Chandrasekaran
> <gchandraseka...@wayfair.com>
 wrote:


>


> I wanted to understand how Storm works when one of its worker crashes.


>


> SO here is the situation I ran into recently. My topology is
> distributed across 2 workers with a total of 6 threads. Somehow 3
> threads died because
 one worker went down. At the same time nimbus service was also
 down because of which it could not spin up threads on other
 available workers.


>


> I noticed Storm wasn’t processing messages for the topology till
> Nimbus was restored and it spun up the remaining threads that were
> down. Is this the
 expected behavior?


>


> I was expecting Storm to continue processing messages with 1 half of
> the threads still up on the other worker.


>


> Thanks,


> Ganesh


>


>

Re: Storm worker crash scenario

Reply via email to