Hi, I have used OpenMPI before without any troubles, and configured MPICH, MPICH2 and OpenMPI in many different machines before, but recently we upgraded the OS to Fedora 17, and now I'm having trouble running an MPI code in two of our machines connected via a switch.
I thought perhaps the old installation was giving problems, so I reinstalled OpenMPI (1.6.4) and I have no trouble when running a parallel code in just one node. I also don't have any trouble ssh'ing (without need for password) between these machines, but when I try to run a parallel job spanning both machines, I get a hanged mpiexec process in the submitting machine, and an "orted" process in the other machine, but nothing moves. I guess it is an issue with libraries and/or different MPI versions (the machines have other site-wide MPI libraries installed), but I'm not sure how to debug the issue. I looked in the FAQ, but I didn't find anything relevant. Issue http://www.open-mpi.org/faq/?category=running#intel-compilers-static is different, since I don't get any warning or errors when running, just all processes stuck. Is there any way to dump details of what OpenMPI is trying to do in each node, so I can see if it is looking for different libraries in each node, or something similar? Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/