Hi, I see an increasing number of tuple failures when increasing the number of Es-Bolt's parallelism beyond 20.
One error I frequently see in the storm's worker logs is: 10:56:36.145 s.k.KafkaUtils [INFO] Task [4/6] assigned [Partition{host= kafka-broker01.xyz.com:9092, partition=3}, Partition{host= kafka-broker01.xyz.com:9092, partition=9}, Partition{host= kafka-broker01.xyz.com:9092, partition=15}, Partition{host= kafka-broker02.xyz.com:9092, partition=21}, Partition{host= kafka-broker00.xyz.com:9092, partition=27}] 10:56:36.145 s.k.ZkCoordinator [INFO] Task [4/6] Deleted partition managers: [] 10:56:36.145 s.k.ZkCoordinator [INFO] Task [4/6] New partition managers: [] 10:56:36.145 s.k.ZkCoordinator [INFO] Task [4/6] Finished refreshing 10:56:37.929 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [381679652030] 10:56:37.930 s.k.PartitionManager [WARN] Using new offset: 381684046396 10:56:38.038 s.k.PartitionManager [WARN] Removing the failed offsets that are out of range: [381679656376, ... 12,000 offsets here ... , 381679657542] 10:56:38.055 STDERR [INFO] 2016-03-31 10:56:38,040 ERROR Unable to write to stream UDP:localhost:514 for appender syslog 10:56:38.060 STDERR [INFO] 2016-03-31 10:56:38,042 ERROR An exception occurred processing Appender syslog org.apache.logging.log4j.core.appender.AppenderLoggingException: Error flushing stream UDP:localhost:514 10:56:38.061 STDERR [INFO] at org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:159) 10:56:38.061 STDERR [INFO] at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:112) 10:56:38.061 STDERR [INFO] at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:99) 10:56:38.061 STDERR [INFO] at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:430) 10:56:38.061 STDERR [INFO] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:409) 10:56:38.062 STDERR [INFO] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:367) 10:56:38.062 STDERR [INFO] at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:112) 10:56:38.062 STDERR [INFO] at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:738) 10:56:38.062 STDERR [INFO] at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:708) 10:56:38.063 STDERR [INFO] at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:243) 10:56:38.063 STDERR [INFO] at storm.kafka.PartitionManager.fill(PartitionManager.java:183) 10:56:38.063 STDERR [INFO] at storm.kafka.PartitionManager.next(PartitionManager.java:131) 10:56:38.063 STDERR [INFO] at storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:141) 10:56:38.064 STDERR [INFO] at backtype.storm.daemon.executor$fn__5624$fn__5639$fn__5670.invoke(executor.clj:607) 10:56:38.064 STDERR [INFO] at backtype.storm.util$async_loop$fn__545.invoke(util.clj:479) 10:56:38.064 STDERR [INFO] at clojure.lang.AFn.run(AFn.java:22) 10:56:38.064 STDERR [INFO] at java.lang.Thread.run(Thread.java:745) 10:56:38.064 STDERR [INFO] Caused by: java.io.IOException: Message too long 10:56:38.065 STDERR [INFO] at java.net.PlainDatagramSocketImpl.send(Native Method) 10:56:38.065 STDERR [INFO] at java.net.DatagramSocket.send(DatagramSocket.java:693) 10:56:38.065 STDERR [INFO] at org.apache.logging.log4j.core.net.DatagramOutputStream.flush(DatagramOutputStream.java:103) 10:56:38.065 STDERR [INFO] at org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:156) 10:56:38.066 STDERR [INFO] ... 16 more 10:56:38.066 STDERR [INFO] 10:56:59.832 b.s.m.n.Server [INFO] Getting metrics for server on port 6704 10:56:59.832 b.s.m.n.Client [INFO] Getting metrics for client connection to Netty-Client-server-005-17.xyz.com/10.10.10.169:6704 10:56:59.833 b.s.m.n.Client [INFO] Getting metrics for client connection to Netty-Client-server-006-16.xyz.com/10.10.10.171:6704 10:56:59.833 b.s.m.n.Client [INFO] Getting metrics for client connection to Netty-Client-server-006-15.xyz.com/10.10.10.170:6704 10:56:59.833 b.s.m.n.Client [INFO] Getting metrics for client connection to Netty-Client-server-004-17.xyz.com/10.10.10.164:6704 10:56:59.833 b.s.m.n.Client [INFO] Getting metrics for client connection to Netty-Client-server-005-13.xyz.com/10.10.10.165:6704 10:57:24.446 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [381803772151] 10:57:24.447 s.k.PartitionManager [WARN] Using new offset: 381808257383 In particular, see the line: *Removing the failed offsets that are out of range: [381679656376, ... 12,000 offsets here ... , 381679657542]* It actually had 12,000 offsets that I removed to make the message look smaller here. Why does it have so many out of range offsets? Does anyone know what I may be doing wrong? Thanks Tid