Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Reuti
Am 06.07.2010 um 23:31 schrieb Ralph Castain: Problem isn't with ssh - the problem is that the daemons need to open a TCP connection back to the machine where mpirun is running. If the firewall blocks that connection, then we can't run. If you can get a range of ports opened, then you can

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Jeff Squyres
On Jul 6, 2010, at 5:41 PM, Robert Walters wrote: > Thanks for your expeditious responses, Ralph. > > Just to confirm with you, I should change openmpi-mca-params.conf to include: > > oob_tcp_port_min_v4 = (My minimum port in the range) > oob_tcp_port_range_v4 = (My port range) >

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Ralph Castain
Problem isn't with ssh - the problem is that the daemons need to open a TCP connection back to the machine where mpirun is running. If the firewall blocks that connection, then we can't run. If you can get a range of ports opened, then you can specify the ports OMPI should use for this

Re: [OMPI users] trouble using openmpi under slurm

2010-07-06 Thread David Roundy
On Tue, Jul 6, 2010 at 12:31 PM, Ralph Castain wrote: > Thanks - that helps. > > As you note, the issue is that OMPI doesn't support the core-level allocation > options of slurm - never has, probably never will. What I found interesting, > though, was that your envars don't

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Robert Walters
Yes, there is a system firewall. I don't think the sysadmin will allow it to go disabled. Each Linux machine has the built-in RHEL firewall. SSH is enabled through the firewall though. --- On Tue, 7/6/10, Ralph Castain wrote: From: Ralph Castain Subject:

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Ralph Castain
It looks like the remote daemon is starting - is there a firewall in the way? On Jul 6, 2010, at 2:04 PM, Robert Walters wrote: > Hello all, > > I am using OpenMPI 1.4.2 on RHEL. I have a cluster of AMD Opteron's and right > now I am just working on getting OpenMPI itself up and running. I

[OMPI users] OpenMPI Hangs, No Error

2010-07-06 Thread Robert Walters
Hello all, I am using OpenMPI 1.4.2 on RHEL. I have a cluster of AMD Opteron's and right now I am just working on getting OpenMPI itself up and running. I have a successful configure and make all install. LD_LIBRARY_PATH and PATH variables were correctly edited. mpirun -np 8 hello_c

Re: [OMPI users] trouble using openmpi under slurm

2010-07-06 Thread Ralph Castain
Thanks - that helps. As you note, the issue is that OMPI doesn't support the core-level allocation options of slurm - never has, probably never will. What I found interesting, though, was that your envars don't anywhere indicate that this is what you requested. I don't see anything there that

Re: [OMPI users] trouble using openmpi under slurm

2010-07-06 Thread David Roundy
Ah yes, It's the versions of each that are packaged in debian testing, which are openmpi 1.4.1 and slurm 2.1.9. David On Tue, Jul 6, 2010 at 11:38 AM, Ralph Castain wrote: > It would really help if you told us what version of OMPI you are using, and > what version of SLURM.

Re: [OMPI users] trouble using openmpi under slurm

2010-07-06 Thread Ralph Castain
It would really help if you told us what version of OMPI you are using, and what version of SLURM. On Jul 6, 2010, at 12:16 PM, David Roundy wrote: > Hi all, > > I'm running into trouble running an openmpi job under slurm. I > imagine the trouble may be in my slurm configuration, but since

Re: [OMPI users] trouble using openmpi under slurm

2010-07-06 Thread David Roundy
For what it's worth, the slurm environment variables are: SLURM_JOBID=2817 SLURM_JOB_NUM_NODES=1 SLURM_TASKS_PER_NODE=1 SLURM_TOPOLOGY_ADDR_PATTERN=node SLURM_PRIO_PROCESS=0 SLURM_JOB_CPUS_PER_NODE=2 SLURM_JOB_NAME=submit.sh SLURM_PROCID=0 SLURM_CPUS_ON_NODE=2 SLURM_NODELIST=node02 SLURM_NNODES=1

[OMPI users] trouble using openmpi under slurm

2010-07-06 Thread David Roundy
Hi all, I'm running into trouble running an openmpi job under slurm. I imagine the trouble may be in my slurm configuration, but since the error itself involves mpirun crashing, I thought I'd best ask here first. The error message I get is:

Re: [OMPI users] UDAPL 2.0 support

2010-07-06 Thread Don Kerr
And Solaris has only implemented uDAPL 1.2. -DON On 07/06/10 08:00, Jeff Squyres wrote: We don't recommend using the udapl support in Linux; it is much better to use the native "openib" BTL that uses the verbs interface. We do not do any udapl testing on Linux, as far as I know -- the udapl

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-06 Thread Grzegorz Maj
Hi Ralph, sorry for the late response, but I couldn't find free time to play with this. Finally I've applied the patch you prepared. I've launched my processes in the way you've described and I think it's working as you expected. None of my processes runs the orted daemon and they can perform MPI

Re: [OMPI users] UDAPL 2.0 support

2010-07-06 Thread Jeff Squyres
We don't recommend using the udapl support in Linux; it is much better to use the native "openib" BTL that uses the verbs interface. We do not do any udapl testing on Linux, as far as I know -- the udapl BTL exists mainly for Solaris. On Jul 5, 2010, at 5:43 AM, Gabriele Fatigati wrote: >

Re: [OMPI users] Open MPI, cannot get the results from workers

2010-07-06 Thread jody
Hi I solved this problem in such a way that my master listens for messages from everybody (MPI_ANY_SOURCE) and reacts to all tags (MPI_ANY_TAG). By looking at the status variable set by MPI_Recv, the master can find out who sent the message (status.MPI_SOURCE) and what tag it has (status.MPI_TAG)

Re: [OMPI users] Open MPI, cannot get the results from workers

2010-07-06 Thread David Zhang
if the master receives multiple results from the same worker, how does the master know which result (and the associated tag) arrive first? what MPI commands are you using exactly? On Mon, Jul 5, 2010 at 4:25 PM, Jack Bryan wrote: > When the master sends out the task, it