I am in the process of optimizing my stream. Currently I expect 5 000 000
tuples to come out of my spout per minute. I am trying to beef up my
topology in order to process this in real time without falling behind.

For some reason my batch size is capping out at 83 thousand tuples. I can't
seem to make it any bigger. the processing time doesn't seem to get any
smaller than 2-3 seconds either.
I'm not sure how to configure the topology to get any faster / more
efficient.

Currently all the topology does is a groupby on time and an aggregation
(Count) to aggregate everything.

Here are some data points i've figured out.

Batch Size:5mb
num-workers: 1
parallelismHint: 2
(I'll write this a 5mb, 1, 2)

5mb, 1, 2 = 83K tuples / 6s
10mb, 1, 2 = 83k / 7s
5mb, 1, 4 = 83k / 6s
5mb, 2, 4 = 83k / 3s
5mb, 3, 6 = 83k / 3s
10mb, 3, 6 = 83k / 3s

Can anybody help me figure out how to get it to process things faster ?

My maxSpoutPending is at 1, but when I increased it to 2 it was the same.
MessageTimeoutSec = 100

I've been following this blog: https://gist.github.com/mrflip/5958028
to an extent, not everything word for word though.

I need to be able to process around 66,000 tuples per second and I'm
starting to run out of ideas.

Thanks

-- 
Raphael Hsieh

Reply via email to