I am playing with Apache Storm for a real-time image processing application which requires ultra low latency. In the topology definition, a single spout will emit raw images(5MB) in every 1s and a few bolts will process them. The processing latency of each bolt is acceptable and the overall computing delay can be around 150ms.
*However, I find that the message passing delay between workers on the different nodes is really high. The overall such delay on the 5 successive bolts is around 200ms.* To calculate this delay, I subtract all the task latencies from the end-to-end latency. Moreover, I implement a timer bolt and other processing bolts will register in this timer bolt to record the timestamp before starting the real processing. By comparing the timestamps of the bolts, I find the delay between each bolt is high as I previously noticed. To analyze the source of this high additional delay, I firstly reduce the sending interval to 1s and thus there should be no queuing delay due to the high computing overheads. Also, from the Storm UI, I find none bolt is in high CPU utilization. Then, I checked the network delay. I am using a 1Gbps network testbed and test the network by RTT and bandwidth. The network latency should not be that high to send a 5MB image. Finally, I am thinking about the buffer delay. I find each thread maintains its own sending buffer and transfer the data to the worker's sending buffer. I am not sure how long it takes before the receiver bolt can get this sending message. As suggested by the community, I increase the sender/receiver buffer size to 16384, modify STORM_NETTY_MESSAGE_BATCH_SIZE to 32768. However, it did not help. *My question is that how to remove/reduce the messaging overheads between bolts?(inter workers)* It is possible to synchronize the communication between bolts and have the receiver got the sending messages immediately without any delay? ᐧ