The spout is a KafkaSpout and I only have one spout task.
The reason I set the MaxSpoutPendingValue so high was that in the topology,
each tuple processed in a bolt tends to create more tuples. So, although the
KafkaSpout only receives one message, it results in thousands of tuples
downstream.
On Sunday, April 17, 2016 6:56 AM, Kevin Conaway
<[email protected]> wrote:
What type of spout is it? How many spout tasks do you have?
Maxspoutpending seems pretty high so its possible the tuples could be toming
out in the queue and if the spout isnt reliable, or if acking is disabled, they
will be discarded
On Friday, April 15, 2016, <[email protected]> wrote:
I've placed two logs in the bolts to verify that tuples are missing. One log
right before the tuple is emitted and another at the beginning of the execute
method for the downstream bolt. These logs should contain the same statements,
however, the downstream bolt is lacking close to 1000 of the 21,000 tuples it
should be receiving.
On Friday, April 15, 2016 12:56 PM, Kevin Conaway
<[email protected]> wrote:
How are you verifying that the tuples are failing? If you're looking at storm
UI for the exact counts you may be mislead. Storm samples tuples for at a
configurable rate (defaulted to 0.05) and extrapolates the metrics shown in the
UI based on that number. For dev or testing purposes you can set the
_topology.stats.sample.rate_ to 1 in storm.yaml which will cause storm to
compute stats based on every tuple.
On Fri, Apr 15, 2016 at 12:33 PM, <[email protected]> wrote:
Hi all,
I've recently run into a problem where my topology seems to be losing tuples
after some continuous processing. That is the number of tuples emitted from one
bolt doesn't equal the number of tuples ack'ed for the downstream bolt. It's
also not reporting any tuples as having failed, I ack immediately in each
exectue method, and there seem to be no errors in the logs. Due to the nature
of the topology, one bolt tends to emit about 10 tuples for each tuple that it
receives, resulting in the topology itself getting backed up relatively
quickly. I've read in other articles that this can result in a memory leak,
which might be the cause of my lost tuples.
My question is what configuration properties of the topology can I change that
would potentially resolve this problem? I currently have my
executor.send.buffer and executor.receive.buffer set at 16384, the
maxSpoutPending at 500000, and the tupleTimeout at 300000, which I thought
would help, but still have not seen any improvement. Or is there something else
that might be causing this problem?
Thanks
--
Kevin Conaway
http://www.linkedin.com/pub/kevin-conaway/7/107/580/
https://github.com/kevinconaway
--
Kevin Conaway
http://www.linkedin.com/pub/kevin-conaway/7/107/580/
https://github.com/kevinconaway