We found the following exception stacktrace during the startup of the worker:
2014-09-09 01:55:29 STDIO [ERROR] Sep 09, 2014 1:55:29 AM org.jboss.netty.channel.DefaultChannelPipeline WARNING: An exception was thrown by a user handler while handling an exception event ([id: 0x4cb6a9a5] EXCEPTION: java.net.ConnectException: Connection refused) java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at backtype.storm.messaging.netty.Client.reconnect(Client.java:94) at backtype.storm.messaging.netty.StormClientHandler.exceptionCaught(StormClientHandler.java:118) at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-09-09 01:55:30 b.s.m.n.Client [INFO] Reconnect ... [24] 2014-09-09 01:55:31 b.s.m.n.Client [INFO] Reconnect ... [25] 2014-09-09 01:55:31 STDIO [ERROR] Sep 09, 2014 1:55:31 AM org.jboss.netty.channel.DefaultChannelPipeline WARNING: An exception was thrown by a user handler while handling an exception event ([id: 0xa80acab1] EXCEPTION: java.net.ConnectException: Connection refused) java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at backtype.storm.messaging.netty.Client.reconnect(Client.java:94) at backtype.storm.messaging.netty.StormClientHandler.exceptionCaught(StormClientHandler.java:118) at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Tue, Sep 9, 2014 at 4:22 PM, Snehal <snehal.ku...@gmail.com> wrote: > Hello, > > We have a multi-node storm cluster and we are using *Guaranteed > processing* of Storm for making sure that events are processed exactly > once. > > Our topology consists of a spout and multiple bolts (50). We have set the > number of acker equivalent to the number of workers. From our logs we can > identify that bolts are acking but the spout may not receive them. We do > notice that the spout is receiving the ack from bolts which are on the same > node. We believe the issue to be either network connectivity or ack timing > out. > > Any idea what is cause of why the remote bolts are unable to ack and how > do we debug this? Any suggestions are appreciated. > > Thanks, > Snehal >