On Apr 21, 2009, at 11:01 AM, Tsung Han Shie wrote:

I tried to increase speed of a program with openmpi-1.1.3

Did you mean 1.1.3 or 1.3.1?

by adding following 4 parameters into openmpi-mca-params.conf file.

mpi_leave_pinned=1
btl_openib_eager_rdma_num=128
btl_openib_max_eager_rdma=128
btl_openib_eager_limit=1024

If you meant 1.3.1 above, please see the following message about an important bug in 1.3 and 1.3.1 with the use of mpi_leave_pinned:

    http://www.open-mpi.org/community/lists/announce/2009/03/0029.php


and then, I ran my program twice(124 processes on 31 nodes). one with "mpi_leave_pinned=1", another with "mpi_leave_pinned=0". All of them were stopped abnormally with "ctrl+c" and "killall -9 <program>".

Why -- did they hang?

After that, I couldn't start to run that program again.

What exactly was the error?

I checked every nodes with "free -m" and I found that huge amount of cached memory were used in each nodes. Could this situation be caused by those 4 parameters? IS there anyway to free theme?


Probably not.

Can you send all the information listed here:

    http://www.open-mpi.org/community/help/

--
Jeff Squyres
Cisco Systems

Reply via email to