Re: kafkaspout is very slow

2015-02-05 Thread clay teahouse
CPU is around 100% On Wed, Feb 4, 2015 at 9:30 PM, Michael Rose wrote: > How does your CPU look at 23000 tuples/s? Still low? > > Have you profiled to see if anything is blocking? Is your spout constantly > doing work? > > *Michael Rose* > Senior Platform Engineer > *Full*Contact | fullcontact.c

Re: kafkaspout is very slow

2015-02-04 Thread Michael Rose
How does your CPU look at 23000 tuples/s? Still low? Have you profiled to see if anything is blocking? Is your spout constantly doing work? *Michael Rose* Senior Platform Engineer *Full*Contact | fullcontact.com

Re: kafkaspout is very slow

2015-02-04 Thread clay teahouse
I bumped the kafka buffer/fetch sizes to kafka.fetch.size.bytes: 12582912 kafka.buffer.size.bytes: 12582912 The throughput almost doubled (to about 23000 un-acked tuples/second). It seems increasing the sizes for these two parameters further does not improve the performance further. Is there any

Re: kafkaspout is very slow

2015-02-04 Thread clay teahouse
100,000 records is about 12MB. I'll try bumping the numbers, by 100 fold to see if it makes any difference. thanks, -Clay On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura wrote: > I would bump these numbers up by a lot: > > kafka.fetch.size.bytes: 102400kafka.buffer.size.bytes: 102400 > > Say 10

Re: kafkaspout is very slow

2015-02-04 Thread Filipa Moura
I would bump these numbers up by a lot: kafka.fetch.size.bytes: 102400kafka.buffer.size.bytes: 102400 Say 10 or 100 times that or more. I dont know by heart how much I increased those numbers on my topology. How many bytes are you writting per minute on kafka? Try dumping 1 minute of message

Re: kafkaspout is very slow

2015-02-04 Thread Michael Rose
You might increase the number of ackers too if acking is slow. *Michael Rose* Senior Platform Engineer *Full*Contact | fullcontact.com

Re: kafkaspout is very slow

2015-02-04 Thread clay teahouse
I am reading (sending data to the topic) about 100,000 records per second. My kafka consumer can consume the 3 millions records in less than 50 seconds. I have disabled the ack. But with the ack enabled, I won't even get 1500 records per second from the topology. With ack disabled, I get about 1200

Re: kafkaspout is very slow

2015-02-04 Thread Filipa Moura
can you share a screenshot of the Storm UI for your spout? On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse wrote: > I have this issue with any amount of load. Different max spout pendings > do not seem to make much a difference. I've lowered this parameter to 100, > still a little difference . A

Re: kafkaspout is very slow

2015-02-04 Thread Filipa Moura
How many messages are you reading per second? I had a few problems with my spout originally but it was either because 1) was not acking the messages and because of max pending they weren't been thrown away from the "queue" 2) buffer size and fetch size was too small: have you tried to figure out ho

Re: kafkaspout is very slow

2015-02-04 Thread clay teahouse
I have this issue with any amount of load. Different max spout pendings do not seem to make much a difference. I've lowered this parameter to 100, still a little difference . At this point the bolt consuming the data does no processing. On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis wrote: > I

Re: kafkaspout is very slow

2015-02-04 Thread Haralds Ulmanis
I'm not sure, that i understand your problem .. but here is few points: If you have large pending spout size and slow processing - you will see large latency at kafka spout probably. Spout emits message .. it stays in queue for long time (that will add latency) .. and finally is processed and ack r

Re: kafkaspout is very slow

2015-02-04 Thread Andrey Yegorov
have you tried increasing max spout pending parameter for the spout? builder.setSpout("kafka", new KafkaSpout(spoutConfig), TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) //the maximum parallelism you c

kafkaspout is very slow

2015-02-03 Thread clay teahouse
Hi all, In my topology, kafka spout is responsible for over 85% of the latency. I have tried different spout max pending and played with the buffer size and fetch size, still no luck. Any hint on how to optimize the spout? The issue doesn't seem to be with the kafka side, as I see high throughput