Hi, Before anything else, I'm using Apache Storm 1.2.1 and Apache Kafka 1.1.1. On my initial tests, I have:
- *6 supervisor nodes * - 1 worker per node - 7 KafkaSpout executors per worker - 1 bolt (that does nothing) per worker - 0 Ackers - 2 Kafka brokers with *42 partitions *on the topic With that configuration, no matter how I change some other configs (Please see list below), my topology is capping out at around 48k TPS. Please note that CPU usage, for the Supervisor nodes, is only around 20% and Network usage is only around 20Mbps for both the Kafka Brokers and Supervisor nodes, well below the network capacities. Now, I have increased the supervisor nodes from 6 to 12 and used a new topic with 82 partitions, thinking that scaling out could help the performance. So is this the new configuration: - *12 supervisor nodes * - 1 worker per node - 7 KafkaSpout executors per worker - 1 bolt (that does nothing) per worker - 0 Ackers - 2 Kafka brokers with *84 partitions* on the topic And I'm still getting around 48k TPS. Some other configs I played around with: - max.poll.records - max.spout.pending - processing.guarantee - offset.commit.period.ms - max.uncommitted.offsets - poll.timeout.ms - fetch.min.bytes - fetch.max.bytes - max.partition.fetch.bytes - receive.buffer.bytes - fetch.max.wait.ms - topology.executor.receive.buffer.size - topology.executor.send.buffer.size - topology.receiver.buffer.size - topology.transfer.buffer.size Am I missing something here? Why is the throughput not improving? Just to add, I have also done some performance isolation tests on both Kafka and Storm. On a distributed consumer using Spark and Kafka, I was able to get around 700K TPS (So we know that Kafka isn't the issue here). And also I could get around 400k TPS on a custom Storm topology with 1 spout that generates random transactions and 1 bolt that does nothing. I feel like the numbers don't add up and the topology shouldn't be capping out at around 48K TPS. Your suggestions would be very much appreciated. Thanks, Bernard