I bumped the kafka buffer/fetch sizes to kafka.fetch.size.bytes: 12582912 kafka.buffer.size.bytes: 12582912
The throughput almost doubled (to about 23000 un-acked tuples/second). It seems increasing the sizes for these two parameters further does not improve the performance further. Is there anything else that I can try? On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <clayteaho...@gmail.com> wrote: > 100,000 records is about 12MB. > I'll try bumping the numbers, by 100 fold to see if it makes any > difference. > thanks, > -Clay > > On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <filipa.mendesmo...@gmail.com > > wrote: > >> I would bump these numbers up by a lot: >> >> kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400 >> >> Say 10 or 100 times that or more. I dont know by heart how much I >> increased those numbers on my topology. >> >> How many bytes are you writting per minute on kafka? Try dumping 1 minute >> of messages to a file to figure out how many bytes that is.. >> I am reading (sending data to the topic) about 100,000 records per >> second. My kafka consumer can consume the 3 millions records in less than >> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even >> get 1500 records per second from the topology. With ack disabled, I get >> about 12000/second. >> I don't lose any data, it is just the data is emitted from the spout to >> the bolt very slowly. >> >> I did bump my buffer sizes but I am not sure if they are sufficient. >> >> topology.transfer.buffer.size: 2048 >> topology.executor.buffer.size: 65536 >> topology.receiver.buffer.size: 16 >> topology.executor.send.buffer.size: 65536 >> >> kafka.fetch.size.bytes: 102400 >> kafka.buffer.size.bytes: 102400 >> >> thanks >> Clay >> >> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura < >> filipa.mendesmo...@gmail.com> wrote: >> >>> can you share a screenshot of the Storm UI for your spout? >>> >>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <clayteaho...@gmail.com> >>> wrote: >>> >>>> I have this issue with any amount of load. Different max spout >>>> pendings do not seem to make much a difference. I've lowered this parameter >>>> to 100, still a little difference . At this point the bolt consuming the >>>> data does no processing. >>>> >>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <hara...@evilezh.net> >>>> wrote: >>>> >>>>> I'm not sure, that i understand your problem .. but here is few points: >>>>> If you have large pending spout size and slow processing - you will >>>>> see large latency at kafka spout probably. Spout emits message .. it stays >>>>> in queue for long time (that will add latency) .. and finally is processed >>>>> and ack received. You will see queue time + processing time in kafka spout >>>>> latency. >>>>> Take a look at load factors of your bolts - are they close to 1 or >>>>> more ? and load factor of kafka spout. >>>>> >>>>> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yego...@gmail.com> >>>>> wrote: >>>>> >>>>>> have you tried increasing max spout pending parameter for the spout? >>>>>> >>>>>> builder.setSpout("kafka", >>>>>> new KafkaSpout(spoutConfig), >>>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>> //the maximum parallelism you can have on a KafkaSpout is >>>>>> the number of partitions >>>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >>>>>> >>>>>> ---------- >>>>>> Andrey Yegorov >>>>>> >>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteaho...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> In my topology, kafka spout is responsible for over 85% of the >>>>>>> latency. I have tried different spout max pending and played with the >>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize >>>>>>> the >>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high >>>>>>> throughput with the simple kafka consumer. >>>>>>> >>>>>>> thank you for your feedback >>>>>>> Clay >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >