Thanks for replying. Sorry for not being clear, acking is enabled, we don’t use anchoring when emitting events from our bolts.
I’ve changed our log level to DEBUG. Are there any error messages I should look for in particular? Thanks again for your help From: Stig Rohde Døssing <s...@apache.org> Reply-To: "user@storm.apache.org" <user@storm.apache.org> Date: Sunday, October 15, 2017 at 5:25 PM To: "user@storm.apache.org" <user@storm.apache.org> Subject: Re: Topology high number of failures Hi, Could you elaborate on your configuration? You say you aren't using anchoring, but if you're getting tuple failures (and a complete latency) then acking must be enabled. Is acking enabled for the spout, and then you don't use an anchor when emitting tuples from the bolts in the topology, or what do you mean? 0.9.4 should be able to log when a tuple fails in the spout, here https://github.com/apache/storm/blob/v0.9.4/storm-core/src/clj/backtype/storm/daemon/executor.clj#L371. I believe you need to set the "backtype.storm.daemon.executor" logger level to DEBUG in the logback config. 2017-10-15 11:10 GMT+02:00 Yovav Waichman <yovav.waich...@jivesoftware.com<mailto:yovav.waich...@jivesoftware.com>>: Hi, We are using Apache Storm for a couple of years, and everything was fine till now. For our spout we are using “storm-kafka-0.9.4.jar”. Lately, our “Failed” number of events has increased dramatically, and currently almost 20% of our total events are marked as Failed. We tried investigating our Topology logs, but we came up empty handed. Moreover, our spout complete latency is 25.996 ms. We suspected that our db is under a heavy load, so we increased our message timeout t0 60 and even 300 seconds, but that had no affect on the number of failures. Lowering our max pending value has produced a negative result. At some point, since we are not using anchoring, we thought about adding anchoring, but we saw that the KafkaSpout handles failures by replaying them, so we were not sure whether to add it or not. It would be helpful if you can direct us as to where we can find in Storm logs the reason for these failures, if there’s an exception which is not caught, maybe a time out, since we are a bit blind at the moment. We would appreciate any help with that. Thanks in advance, Yovav