I have a performance problem with receiving. In a single master thread, I made several Irecv calls:
Irecv(buf1, ..., tag, ANY_SOURCE, COMM_WORLD) Irecv(buf2, ..., tag, ANY_SOURCE, COMM_WORLD) ... Irecv(bufn, ..., tag, ANY_SOURCE, COMM_WORLD) all of which try to receive from any node for messages with the same tag. Then, whenever any of the Irecv completes (using Testany), a separate thread is dispatched to work on the received message. In my program, many nodes will send to this master thread. However, I noticed that the speed of recv is almost unaffected no matter how many Irecv calls were made. It seems that multiple Irecv calls does not mean concurrently receiving from many nodes. By profiling the node running the master thread, I can see that the network input bandwidth is quite low. Is my understanding correct ? or How to maximize the recv throughput of the master thread ? Thanks ! Zhang Lei @ Baidu, Inc.
