Hi, 

I installed OpenMPI1.4.1 as a non-root user on a cluster. It is totally OK when 
I run with mpirun or mpiexec on one single node for many processes. However, 
when I lauch many processes on multiple nodes, I can observe jobs are 
distributed to those nodes (by using "top"), but all the jobs just hang there 
and cannot finish.

I think the nodes use TCP to communicate with each other. This cluster also 
provides MPICH2, which was configured by the sys admin., and has no problem to 
do node communication in MPICH2. Besides, I read from some posts, which says 
this may be caused by TCP firewall. Since I have no root's right, and I don't 
know what shall request the admin. to do to fix this problem. So, can you tell 
me how to do that either by the admin root or by the non-root user (if 
possible)?

Thank you very much.
Hao

Reply via email to