Hey,
I have a 5 node storm cluster. I have deployed a storm topology with
Kafka Spout that reads from Kafka cluster and a bolt that writes to
database. When I tested java Kafka consumer independently, I got
throughput around 1M messages per second. Also, when I tested my database
independently, i got throughput maximum around 100k messages per second.
Since, my database is very slow at consuming messages, I need to reduce the
intake of messages by Kafka Spout. Adding more parallelism to DB bolt
doesn't help as I have reached the maximum throughput of database.
Periodically I am seeing "Out of memory exception" in Kafka Spout and
processing stops.
1. How can I reduce the rate of Kafka Spout intake of messages? . I assume
the reason for OOM exceptions is that Kafka Spout is fast to read more
messages from Kafka but, DB bolt is not able to flush the messages to
database at the same rate. Is that the reason? I tried playing around by
reducing fetchsize but it didn't help.
2. Suppose, if my DB bolt is somehow able to flush entire messages to the
database at the same rate as Kafka spout, but if database gets slow in the
future, will the message intake rate get reduced dynamically to ensure that
OOM exception doesn't happen? How can i pro actively take measures?
3. What is the best way to tune my system parameters? Also, how do I test
performance(throughput) of my storm topology?
I would like to see how the current storm community deals with my problem
Thanks for your time