Thanks Steve, Enno, Martin. Only common thing between teh worker was the gc
logs that I configured. I dont find anything else. After i made the changes
there, what I also is that spout stops consuming and there are no crashes
of workers too. It just stops and nothing happens.

I think it has to do with the number of messages being sent into the
system. If I keep the message level low (adjust maxx spout pending), then
the topology is up for 90 mins and counting. Otherwise, the system crashed
in 15 mins. What I was expecting was that the topology crashes and then
restarts, but that is exactly what was not happening.

i tried it in 0.10.0-beta1 too and i found the same behavior. The last prod
version i had was 0.9.0-wip16 and there the 0mq was used. I did not find
issues there though.

THanks
kashyap

On Sep 13, 2015 15:39, "Stephen Powis" <[email protected]> wrote:

> Kashyap -  I see this same issue on 0.9.5
>
> On Sun, Sep 13, 2015 at 9:58 AM, Enno Shioji <[email protected]> wrote:
>
>> There was a change in that area in 0.9.6 (
>> https://issues.apache.org/jira/browse/STORM-763), although I'm not sure
>> if it will help your issue.
>>
>>
>> On Sun, Sep 13, 2015 at 2:35 PM, Kashyap Mhaisekar <[email protected]>
>> wrote:
>>
>>> Hmm. Thanks for the lead. On storm UI, the uptime for each executor
>>> except spout shows pretty much consistent values. Spout has crashed for
>>> sure. But then never comes up. Will check this up again.
>>>
>>> But the other question is - Is the Netty reconnects issue solved in
>>> 0.9.5? What is your storm version?
>>>
>>> Thanks
>>> Kashyap
>>> On Sep 13, 2015 08:04, "Martin Burian" <[email protected]>
>>> wrote:
>>>
>>>> They do restart after a while, yes. But if you don't see any error in
>>>> the log, it's weird. I encountered a case of workers not starting because I
>>>> configured the worker JVM to expose JMX interface for remote monitoring on
>>>> a given port. Other workers on the same machine however could not start as
>>>> they failed to bind to the already used port. No error messages whatsoever.
>>>> Might any such thing be your case?
>>>>
>>>> Othervise the cause should be logged somewhere. A worker is definitely
>>>> not running, or at least talking to the supervisor. You could try using
>>>> less workers to find out when/where the error occurs.
>>>>
>>>> Martin
>>>>
>>>> ne 13. 9. 2015 v 13:43 odesílatel Kashyap Mhaisekar <
>>>> [email protected]> napsal:
>>>>
>>>>> All worker logs have the same log. Workers are up. I am using only one
>>>>> box with multiple workers to test.
>>>>> Workers should be restarted of they fail right? So ideally, this error
>>>>> should be gone in a while..
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Kashyap
>>>>> On Sep 13, 2015 05:10, "Martin Burian" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> When this appears in worker log, it means that the worker is trying
>>>>>> to connect to another worker, but the other is not running. What do you 
>>>>>> see
>>>>>> in worker-6707.log? Is the other worker runing?
>>>>>> Matrin
>>>>>>
>>>>>> ne 13. 9. 2015 v 6:06 odesílatel Kashyap Mhaisekar <
>>>>>> [email protected]> napsal:
>>>>>>
>>>>>>> Also,
>>>>>>> Is there a way to switch back to 0mq from Netty? If so, what needs
>>>>>>> to be done?
>>>>>>>
>>>>>>> Thanks
>>>>>>> kashyap
>>>>>>>
>>>>>>> On Sat, Sep 12, 2015 at 10:49 PM, Kashyap Mhaisekar <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Am having a Netty related issues in my storm cluster where the
>>>>>>>> spout stops consuming after a while. The corresponding worker logs 
>>>>>>>> show -
>>>>>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [ERROR] connection
>>>>>>>> attempt 26 to
>>>>>>>> Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707
>>>>>>>> <http://Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707>
>>>>>>>> failed: java.lang.RuntimeException: Returned channel was actually not
>>>>>>>> established*
>>>>>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [INFO] connection
>>>>>>>> attempt 27 to Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707
>>>>>>>> <http://Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707> scheduled 
>>>>>>>> to
>>>>>>>> run in 392 ms*
>>>>>>>> *2015-09-12T23:28:23.784-0400 b.s.m.n.Client [ERROR] connection
>>>>>>>> attempt 27 to Netty-Client-**serverstorm1.myorg.com
>>>>>>>> <http://serverstorm1.myorg.com>**/10.2.70.18:6707
>>>>>>>> <http://10.2.70.18:6707> failed: java.lang.RuntimeException: Returned
>>>>>>>> channel was actually not established*
>>>>>>>>
>>>>>>>> The corresponding supervisor logs had
>>>>>>>> *2015-09-12T23:28:23.018-0400 b.s.d.supervisor [INFO]
>>>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started*
>>>>>>>> *2015-09-12T23:28:23.518-0400 b.s.d.supervisor [INFO]
>>>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started*
>>>>>>>> *2015-09-12T23:28:24.019-0400 b.s.d.supervisor [INFO]
>>>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started*
>>>>>>>>
>>>>>>>> I had storm version 0.9.3 when this issue occurred and had upgraded
>>>>>>>> to 0.9.4 and 0.9.5 to seek relief, but the issue still persists. Am not
>>>>>>>> sure what else to do. Am not even sure why this issue occurs and what
>>>>>>>> triggers it. Any help would be great and appreciated.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Kashyap
>>>>>>>>
>>>>>>>>
>>>>>>>
>>
>

Reply via email to