I found a tough but interesting issue about Storm 1.0.2 to share with you. Firstly I introduce my system structure. My topology has 471 workers, 6 kafka topic and some processor bolt. Backpressure has been disabled in topology by
conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false); If I submit KafkaSpout only, the statistic data from Storm UI is as below: Topology stats Window Emitted Transferred Complete latency (ms) Acked Failed 10m 0s 875161429 0 0 875141691 3h 0m 0s 927117120 0 0 927117320 1d 0h 0m 0s 927117120 0 0 927117320 All time 927117120 0 0 927117320 Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_144 2 2 128701261 0 0 128702858 0 Spout_JSCTGW_145 2 2 162347454 0 0 162347639 0 Spout_JSCTGW_146 2 2 135598494 0 0 135598608 0 Spout_JSCTGW_147 2 2 128307822 0 0 128306102 0 Spout_JSCTGW_148 2 2 160369513 0 0 160360423 0 Spout_JSCTGW_149 2 2 159836885 0 0 159826061 0 In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from kafka cluster. Then I submit the topology with a simple PrintBolt, the 10min stat data is as following: Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Error Host Error Port Spout_JSCTGW_144 2 2 113599789 113599789 0 113513630 0 Spout_JSCTGW_145 2 2 122501522 122501575 0 122659659 0 Spout_JSCTGW_146 2 2 91308915 91308915 0 91308652 0 Spout_JSCTGW_147 2 2 105029568 105029568 0 104968275 0 Spout_JSCTGW_148 2 2 115889172 115889178 0 115890165 0 Spout_JSCTGW_149 2 2 115185591 115185591 0 115185638 0 Showing 1 to 6 of 6 entries 663526019 Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed Process latency (ms) Acked PrintBolt 240 240 0 0 0.241 0.041 665063824 0 665,036,902 The printbolt contains nothing but a simple log.info method, but the total number of emitted tuple of spout decrease to about 600 million. All of machines have 32 cores but the system average load never exceed 15. Also I have checked the network load. My network adapter has 10Gb bandwidth and the dstat -d command shows the max rec or send speed is 300MB/s at most , which doesn’t reach the limit of network adapter. The total number of executors is less than that of total workers, so performance should not be problem. Then I do some more test, Only one kafkaspout reading on topic Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 148918096 0 0 148917805 0 One kafkaspout and one data processing bolt Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 106481751 106481751 0 106367801 0 Showing 1 to 1 of 1 entries Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed AtomPluginBolt 1000 1000 132752203 0 0.019 0.067 106519630 Even I only read one kafka topic, the emit speed is also slowed down. Does anyone meet the same problem?