Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
Interesting - good to know. Thanks On Feb 12, 2014, at 10:38 AM, Adrian Reber wrote: > It seems this is indeed a Moab bug for interactive jobs. At least a bug > was opened against moab. Using non-interactive jobs the variables have > the correct values and mpirun has no problems detecting the co

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Adrian Reber
It seems this is indeed a Moab bug for interactive jobs. At least a bug was opened against moab. Using non-interactive jobs the variables have the correct values and mpirun has no problems detecting the correct number of cores. On Wed, Feb 12, 2014 at 07:50:40AM -0800, Ralph Castain wrote: > Anoth

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
Another possibility to check - it is entirely possible that Moab is miscommunicating the values to Slurm. You might need to check it - I'll install a copy of 2.6.5 on my machines and see if I get similar issues when Slurm does the allocation itself. On Feb 12, 2014, at 7:47 AM, Ralph Castain w

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Adrian Reber
On Wed, Feb 12, 2014 at 07:47:53AM -0800, Ralph Castain wrote: > > > > $ msub -I -l nodes=3:ppn=8 > > salloc: Job is in held state, pending scheduler release > > salloc: Pending job allocation 131828 > > salloc: job 131828 queued and waiting for resources > > salloc: job 131828 has been allocated

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
On Feb 12, 2014, at 7:32 AM, Adrian Reber wrote: > > $ msub -I -l nodes=3:ppn=8 > salloc: Job is in held state, pending scheduler release > salloc: Pending job allocation 131828 > salloc: job 131828 queued and waiting for resources > salloc: job 131828 has been allocated resources > salloc: Gra

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Adrian Reber
$ msub -I -l nodes=3:ppn=8 salloc: Job is in held state, pending scheduler release salloc: Pending job allocation 131828 salloc: job 131828 queued and waiting for resources salloc: job 131828 has been allocated resources salloc: Granted job allocation 131828 sh-4.1$ echo $SLURM_TASKS_PER_NODE 1 s

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
...and your version of Slurm? On Feb 12, 2014, at 7:19 AM, Ralph Castain wrote: > What is your SLURM_TASKS_PER_NODE? > > On Feb 12, 2014, at 6:58 AM, Adrian Reber wrote: > >> No, the system has only a few MOAB_* variables and many SLURM_* >> variables: >> >> $BASH $IF

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
What is your SLURM_TASKS_PER_NODE? On Feb 12, 2014, at 6:58 AM, Adrian Reber wrote: > No, the system has only a few MOAB_* variables and many SLURM_* > variables: > > $BASH $IFS $SECONDS > $SLURM_PTY_PORT > $BASHOPTS

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Adrian Reber
No, the system has only a few MOAB_* variables and many SLURM_* variables: $BASH $IFS $SECONDS $SLURM_PTY_PORT $BASHOPTS $LINENO $SHELL $SLURM_PTY_WIN_COL $BASHP

Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems

2014-02-12 Thread Ralph Castain
Seems rather odd - since this is managed by Moab, you shouldn't be seeing SLURM envars at all. What you should see are PBS_* envars, including a PBS_NODEFILE that actually contains the allocation. On Feb 12, 2014, at 4:42 AM, Adrian Reber wrote: > I tried the nightly snapshot (openmpi-1.7.5a1