Ganesh, All
Do you know if the answer to this is an upgrade to* 0.9.4 *or or* 0.9.5 *or
to version* 0.10.0-beta1*. My topology runs fine for 15 mins and then gives
up with this -
2015-09-11 15:19:51 b.s.m.n.Client [INFO] failed to send requests to
myserver1.personal.com/10.2.72.176:6701:
java.nio.channels.ClosedChannelException: null
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:405)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_79]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
and then with ...
2015-09-11 15:20:12 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-myserver5.personal.com/10.2.72.176:6701... [1]
2015-09-11 15:20:12 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-myserver7.personal.com/10.2.72.72:6704... [1]
2015-09-11 15:20:12 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-myserver3.personal.com/10.2.72.77:6702... [1]
It restarts again and the whole thing repeats.
Thanks
kashyap
On Fri, Sep 4, 2015 at 11:33 AM, Ganesh Chandrasekaran <
[email protected]> wrote:
> Kashyap,
>
>
>
> Yes you will need to upgrade Storm version on cluster as well. Personally,
> I would run tests to see if it fixes existing issue before upgrading.
>
>
>
> Thanks,
>
> Ganesh
>
>
>
> *From:* Joseph Beard [mailto:[email protected]]
> *Sent:* Friday, September 04, 2015 12:07 PM
>
> *To:* [email protected]
> *Subject:* Re: Netty reconnect
>
>
>
> We also ran into the same issue with Storm 0.9.4. We chose to upgrade to
> 0.10.0-beta1 which solved the problem and has been otherwise stable for our
> needs.
>
>
>
>
>
> Joe
>
> —
>
> Joseph Beard
>
> [email protected]
>
>
>
>
>
>
>
> On Sep 3, 2015, at 10:03 AM, Kashyap Mhaisekar <[email protected]>
> wrote:
>
>
>
> Thanks for the advices. Will upgrade from 0.9.3 to 0.9.4. A lame question
> - Does it mean that the existing clusters need to be rebuilt with 0.9.4?
>
> Thanks
> Kashyap
>
> On Sep 3, 2015 08:32, "Nick R. Katsipoulakis" <[email protected]>
> wrote:
>
> Ganesh,
>
>
>
> No I am not.
>
>
>
> Cheers,
>
> Nick
>
>
>
> 2015-09-03 9:25 GMT-04:00 Ganesh Chandrasekaran <
> [email protected]>:
>
> Are you using multilang protocol? I know that after upgrading to 0.9.4 it
> seemed like I was being affected by this bug -
> https://issues.apache.org/jira/browse/STORM-738 and rolled back to
> previous stable version of 0.8.2.
>
> I did not verify this thoroughly on my cluster though.
>
>
>
>
>
> *From:* Nick R. Katsipoulakis [mailto:[email protected]]
> *Sent:* Thursday, September 03, 2015 9:08 AM
>
>
> *To:* [email protected]
> *Subject:* Re: Netty reconnect
>
>
>
>
>
> Hello again,
>
>
>
> I read STORM-404 and I saw that is resolved on version 0.9.4. However, I
> have version 0.9.4 installed in my cluster, and I have seen similar
> behavior in my workers.
>
>
>
> In fact, at random times I would see that some workers were considered
> dead (Netty was dropping messages) and they would be restarted by the
> nimbus.
>
>
>
> Currently, I only see dropped messages but not restarted workers.
>
>
>
> FYI, my cluster has the following information
>
>
>
> - 3X AWS m4.xlarge instances for ZooKeeper and Nimbus
> - 4X AWS m4.xlarge instances for Supervisors (each one with 2 workers)
>
> Thanks,
>
> Nick
>
>
>
> 2015-09-03 8:38 GMT-04:00 Ganesh Chandrasekaran <
> [email protected]>:
>
> Agreed with Jitendra. We were using 0.9.3 version and facing the same
> issue of netty reconnects which was the issue 404. Upgrading to 0.9.4 fixed
> the issue.
>
>
>
> Thanks,
>
> Ganesh
>
>
>
> *From:* Jitendra Yadav [mailto:[email protected]]
> *Sent:* Thursday, September 03, 2015 8:20 AM
> *To:* [email protected]
> *Subject:* Re: Netty reconnect
>
>
>
> I don't know your storm version, but it's worth to check these Jira's and
> see if similar scenario occurring.
>
>
>
> https://issues.apache.org/jira/browse/STORM-404
> https://issues.apache.org/jira/browse/STORM-450
>
>
>
> Thanks
>
> Jitendra
>
>
>
> On Thu, Sep 3, 2015 at 5:22 PM, John Yost <[email protected]>
> wrote:
>
> Hi Everyone,
>
> When I see this, it is evidence that one or more of the workers are not
> starting up, which results in connections either not occuring or
> reconnecting occuring when supervisors kill workers that don't start up
> properly. I recommend checking the supervisor and nimbus logs to see if
> there are any root causes other than network issues causing the
> connect/reconnect.
>
> --John
>
>
>
> On Thu, Sep 3, 2015 at 7:32 AM, Nick R. Katsipoulakis <
> [email protected]> wrote:
>
> Hello Kashyap,
>
> I have been having the same issue for some time now on my AWS cluster. To
> be honest, I do not know how to resolve it.
>
> Regards,
>
> Nick
>
>
>
> 2015-09-03 0:07 GMT-04:00 Kashyap Mhaisekar <[email protected]>:
>
> Hi,
> Has anyone experienced Netty reconnects repeatedly? My workers seem to be
> eternally in reconnect state and topology doesn't serve messages at all. It
> gets connected once in a while and then goes back to getting reconnecting.
>
> Any fixes for this?
> "Reconnect started for Netty-Client"
>
> Thanks
> Kashyap
>
>
>
> --
>
> Nikolaos Romanos Katsipoulakis,
>
> University of Pittsburgh, PhD candidate
>
>
>
>
>
>
>
>
>
> --
>
> Nikolaos Romanos Katsipoulakis,
>
> University of Pittsburgh, PhD candidate
>
>
>
>
>
> --
>
> Nikolaos Romanos Katsipoulakis,
>
> University of Pittsburgh, PhD candidate
>
>
>