I'm trying to figure out why my batch size never seems to get any bigger than 83K tuples. It could be because this is the throughput of my spout, however I don't believe this to be the case as I believe the spout is backing up (i'm not processing the tuples as quickly as they are being produced)
Currently I'm just using a barebones topology that looks like this: Stream spout = topology.newStream("...", ...) .parallelismHint(x) .groupBy("new Fields("time")) .aggregate(new Count(), new Fields("count")) .parallelismHint(x) .each(new Fields("time", "count"), new PrintFilter()); All the stream is doing is aggregating on like timestamps and printing out the count. in my config I've set batch size to 10mb like so: Config config = new Config(); config.(RichSpoutBatchExecutor.MAX_BATCH_SIZE_CONF, 1024*1024*10); when I have the batch size to 5mb or even 1mb there is no difference, everything always adds up to roughly 83K tuples. in order to count up how many tuples are in the batch, I take a look at the system timestamp of when things are printed out (in the print filter) and all the print statements that have the same timestamp, I add the count values up together. When I compare the system timestamp of when the batch was processed, and the tuple timestamps (that they were aggregated on) I am falling behind. This leads me to believe that the spout is emitting more than the number of tuples I am processing, so there should be more than 83K tuples per batch. If anyone has insight to this it would be greatly appreciated. Thanks! -- Raphael Hsieh