On Apr 21, 2009, at 11:01 AM, Tsung Han Shie wrote:
I tried to increase speed of a program with openmpi-1.1.3
Did you mean 1.1.3 or 1.3.1?
by adding following 4 parameters into openmpi-mca-params.conf file. mpi_leave_pinned=1 btl_openib_eager_rdma_num=128 btl_openib_max_eager_rdma=128 btl_openib_eager_limit=1024
If you meant 1.3.1 above, please see the following message about an important bug in 1.3 and 1.3.1 with the use of mpi_leave_pinned:
http://www.open-mpi.org/community/lists/announce/2009/03/0029.php
and then, I ran my program twice(124 processes on 31 nodes). one with "mpi_leave_pinned=1", another with "mpi_leave_pinned=0". All of them were stopped abnormally with "ctrl+c" and "killall -9 <program>".
Why -- did they hang?
After that, I couldn't start to run that program again.
What exactly was the error?
I checked every nodes with "free -m" and I found that huge amount of cached memory were used in each nodes. Could this situation be caused by those 4 parameters? IS there anyway to free theme?
Probably not. Can you send all the information listed here: http://www.open-mpi.org/community/help/ -- Jeff Squyres Cisco Systems