Hello!
  I have encountered the same issue in a case of out of memory in worker
process. Try increase the memory of the wokers by setting nimbus.childopts
property. Also, if you are creating short living object at higher rate
use +UseG1GC
. Since you are saying that you hold data in your memory, I'm suspecting
(as I said) an OutOfMemeory error. Don't cover this just by increasing the
heap size, but also I recommend to profile your worker and see if you have
a memory leak.
  Hope that these help.
  Regards,
 Florin


On Sat, Aug 23, 2014 at 2:24 AM, Andrey Yegorov <andrey.yego...@gmail.com>
wrote:

>
> Have you figured out the rootcause/fix for this issue?
> I just hit it and would really appreciate some time-saving advise.
>
> ----------
> Andrey Yegorov
>
>
> On Wed, Mar 12, 2014 at 10:31 AM, Josh Walton <jwalton...@gmail.com>
> wrote:
>
>> Overnight last night, it appears my Storm Trident topology restarted
>> itself. When I checked the Storm UI, it said the topology had been running
>> for 24 hours, and showed no error or exceptions in any of the bolts.
>>
>> I check the nimbus log and see the following:
>>
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[34
>> 34] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[4
>> 4] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[40
>> 40] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[10
>> 10] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[16
>> 16] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[22
>> 22] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[28
>> 28] not alive
>> 2014-03-12 10:55:06 b.s.s.EvenScheduler [INFO] Available slots:
>> (["5d105f66-1add-421b-8265-e7340a95928c" 6700]
>> ["32ab1745-c260-4491-ae4d-92dcc5d14a62" 6700])
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassigning MITAS3-74-1394565794
>> to 6 slots
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassign executors: [[34 34] [4
>> 4] [40 40] [10 10] [16 16] [22 22] [28 28]]
>>
>> It appears that an executor was alive, and must have timed out somehow
>> since I didn't see any exceptions or stack traces in the logs.
>>
>> Is there a way to change the timeout? I see several timeout settings, but
>> I'm not sure if any of those would help prevent this type of restart. I am
>> using a custom TridentState which holds data in memory so we lost data as a
>> result of this restart, and would like to prevent this from happening again.
>>
>> Thanks
>>
>> Josh
>>
>
>

Reply via email to