This may or may not be related, but I've had similar issues on RHEL
6.x and clones when using the SSH job launcher and running more than
10 processes per node. It sounds like you're only distributing 6
processes per node, so it doesn't sound like your problem, but you
might want to check your hostfile and make sure you're not
oversubscribing one of the nodes. The trick I've found to launch > 10
processes per node via SSH is to set MaxSessions to some number higher
than 10 in /etc/ssh/sshd_config (I choose 100, somewhat arbitrarily).

Assuming you're using the SSH launcher on an RHEL 6 derivative, you
might give this a try. It's an SSH issue, not an OpenMPI one.

Regards,
Tim

On Thu, Apr 12, 2012 at 9:04 AM, Seyyed Mohtadin Hashemi
<haa...@gmail.com> wrote:
> Hello,
>
> I have a very peculiar problem: I have a micro cluster with three nodes (18
> cores total); the nodes are clones of each other and connected to a frontend
> via Ethernet and Debian squeeze as the OS for all nodes. When I run parallel
> jobs I can used up “-np 10” if I go further the job crashes, I have
> primarily done tests with GROMACS (because that is what I will be running)
> but have also used OSU Micro-Benchmarks 3.5.2.
>
> For a simple parallel job I use: “path/mpirun –hostfile path/hostfile –np XX
> –d –display-map path/mdrun_mpi –s path/topol.tpr –o path/output.trr”
>
> (path is global) For –np XX being smaller than or 10 it works, however as
> soon as I make use of 11 or larger the whole thing crashes. The terminal
> dump is attached to this mail: when_working.txt is for “–np 10”,
> when_crash.txt is for “–np 12”, and OpenMPI_info.txt is output from
> “path/mpirun --bynode --hostfile path/hostfile --tag-output ompi_info -v
> ompi full –parsable”
>
> I have tried OpenMPI v.1.4.2 all the way up to beta v1.5.5, and all yield
> the same result.
>
> The output files are from a new install I did today: I formatted all nodes
> and started from a fresh minimal install of Squeeze and used "apt-get
> install gromacs gromacs-openmpi" and installed all dependencies. Then I ran
> two jobs using the parameters described above, I also did one with OSU bench
> (data is not included) it also crashed with “-np” larger than 10.
>
> I hope somebody can help figure out what is wrong and how I can fix it.
>
> Best regards,
> Mohtadin
>
> *****************************************************************************
> **
> **
> ** WARNING:  This email contains an attachment of a very suspicious type.
> **
> ** You are urged NOT to open this attachment unless you are absolutely
> **
> ** sure it is legitimate.  Opening this attachment may cause irreparable
> **
> ** damage to your computer and your files.  If you have any questions
> **
> ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT.
> **
> **
> **
> ** This warning was added by the IU Computer Science Dept. mail scanner.
> **
> *****************************************************************************
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to