[OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
Hi, I am currently trying to set up OpenMPI in torque. OpenMPI is build with tm support. Torque is correctly assigning nodes and I can run mpi-programs on single nodes just fine. the problem starts when processes are split between nodes. For example, I create an interactive session with torq

Re: [OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
can confirm tm is used. Before invoking mpirun, you might want to cleanup the ompi directory in /tmp Cheers, Gilles Oswin Krause wrote: Hi, I am currently trying to set up OpenMPI in torque. OpenMPI is build with tm support. Torque is correctly assigning nodes and I can run mpi-programs on

Re: [OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
the machinefile, the number of slots is automatically detected) Can you run mpirun --mca plm_base_verbose 10 ... So we can confirm tm is used. Before invoking mpirun, you might want to cleanup the ompi directory in /tmp Cheers, Gilles Oswin Krause wrote: Hi, I am currently trying to set

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
ck the code and see what could be happening here Btw, what is the output of hostname hostname -f On a00551 ? Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4) installled and running correctly on your cluster ? Cheers, Gilles Oswin Krause wrote: Hi Gilles, Thanks for th

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
f you use the same hostfile, or some hostfile as an explicit argument when you run mpirun from within the torque job? -- bennet On Wed, Sep 7, 2016 at 9:25 AM, Oswin Krause wrote: Hi Gilles, Thanks for the hint with the machinefile. I know it is not equivalent and i do not intend to use

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Oswin Krause
an “openmpi” directory underneath that one, and the mca_xxx libraries are down there On Sep 7, 2016, at 7:43 AM, Oswin Krause wrote: Hi Gilles, I do not have this library. Maybe this helps already... libmca_common_sm.so libmpi_mpifh.so libmpi_usempif08.so libompitrace.so libopen

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Oswin Krause
see what could be happening here Btw, what is the output of hostname hostname -f On a00551 ? Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4) installled and running correctly on your cluster ? Cheers, Gilles Oswin Krause wrote: Hi Gilles, Thanks for the hint with the

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Oswin Krause
nd running correctly on your cluster ? Cheers, Gilles Oswin Krause wrote: Hi Gilles, Thanks for the hint with the machinefile. I know it is not equivalent and i do not intend to use that approach. I just wanted to know whether I could start the program successfully at all. Outside torque(4.2),

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Oswin Krause
.science.domain:18097] [[34561,0],0] plm:base:receive stop comm [a00551.science.domain:18097] mca: base: close: component tm closed [a00551.science.domain:18097] mca: base: close: unloading component tm Best, Oswin On 2016-09-08 10:33, Oswin Krause wrote: Hi Gilles, Hi Ralph, I have just rebuild openmpi

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Oswin Krause
ras_base_verbose 10 hostname Cheers, Gilles On 9/8/2016 6:42 PM, Oswin Krause wrote: Hi, i reconfigured to only have one physical node. Still no success, but the nodefile now looks better. I still get the errors: [a00551.science.domain:18021] [[34768,0],1] bind() failed on error Address

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Oswin Krause
Hi, okay lets reboot, even though Gilles last mail was onto something. The problem is that i failed starting programs with mpirun when more than one node was involved. I mentioned that it is likely some configuration problem with my server, especially authentification(we have some kerberos ni