Hmm. Thanks for the lead. On storm UI, the uptime for each executor except spout shows pretty much consistent values. Spout has crashed for sure. But then never comes up. Will check this up again.
But the other question is - Is the Netty reconnects issue solved in 0.9.5? What is your storm version? Thanks Kashyap On Sep 13, 2015 08:04, "Martin Burian" <[email protected]> wrote: > They do restart after a while, yes. But if you don't see any error in the > log, it's weird. I encountered a case of workers not starting because I > configured the worker JVM to expose JMX interface for remote monitoring on > a given port. Other workers on the same machine however could not start as > they failed to bind to the already used port. No error messages whatsoever. > Might any such thing be your case? > > Othervise the cause should be logged somewhere. A worker is definitely not > running, or at least talking to the supervisor. You could try using less > workers to find out when/where the error occurs. > > Martin > > ne 13. 9. 2015 v 13:43 odesÃlatel Kashyap Mhaisekar <[email protected]> > napsal: > >> All worker logs have the same log. Workers are up. I am using only one >> box with multiple workers to test. >> Workers should be restarted of they fail right? So ideally, this error >> should be gone in a while.. >> >> Thanks >> >> >> Kashyap >> On Sep 13, 2015 05:10, "Martin Burian" <[email protected]> wrote: >> >>> When this appears in worker log, it means that the worker is trying to >>> connect to another worker, but the other is not running. What do you see in >>> worker-6707.log? Is the other worker runing? >>> Matrin >>> >>> ne 13. 9. 2015 v 6:06 odesÃlatel Kashyap Mhaisekar <[email protected]> >>> napsal: >>> >>>> Also, >>>> Is there a way to switch back to 0mq from Netty? If so, what needs to >>>> be done? >>>> >>>> Thanks >>>> kashyap >>>> >>>> On Sat, Sep 12, 2015 at 10:49 PM, Kashyap Mhaisekar < >>>> [email protected]> wrote: >>>> >>>>> Am having a Netty related issues in my storm cluster where the spout >>>>> stops consuming after a while. The corresponding worker logs show - >>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [ERROR] connection >>>>> attempt 26 to >>>>> Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707 >>>>> <http://Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707> >>>>> failed: java.lang.RuntimeException: Returned channel was actually not >>>>> established* >>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [INFO] connection attempt >>>>> 27 to Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707 >>>>> <http://Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707> scheduled to >>>>> run in 392 ms* >>>>> *2015-09-12T23:28:23.784-0400 b.s.m.n.Client [ERROR] connection >>>>> attempt 27 to Netty-Client-**serverstorm1.myorg.com >>>>> <http://serverstorm1.myorg.com>**/10.2.70.18:6707 >>>>> <http://10.2.70.18:6707> failed: java.lang.RuntimeException: Returned >>>>> channel was actually not established* >>>>> >>>>> The corresponding supervisor logs had >>>>> *2015-09-12T23:28:23.018-0400 b.s.d.supervisor [INFO] >>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>> *2015-09-12T23:28:23.518-0400 b.s.d.supervisor [INFO] >>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>> *2015-09-12T23:28:24.019-0400 b.s.d.supervisor [INFO] >>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>> >>>>> I had storm version 0.9.3 when this issue occurred and had upgraded to >>>>> 0.9.4 and 0.9.5 to seek relief, but the issue still persists. Am not sure >>>>> what else to do. Am not even sure why this issue occurs and what triggers >>>>> it. Any help would be great and appreciated. >>>>> >>>>> Thanks >>>>> Kashyap >>>>> >>>>> >>>>
