We have a topology that is experiencing massive amounts of spout failures without corresponding bolt failures. We have been interpreting these as tuple timeouts, but we seem to be getting more of these failures than we understand to be possible with timeouts.
Our topology uses a Kafka spout and the topology is configured with: topology.message.timeout.secs = 300 topology.max.spout.pending = 2500 Based on these settings, I would expect the topology to experience a maximum of 2500 tuple timeouts per 300 seconds. But from the Storm UI, we see that after running for about 10 minutes, the topology will show about 50K spout failures and zero bolt failures. Am I misunderstanding something that would allow more tuples to time out, or is there another source of spout failures? Thanks in advance, Kevin Peek