SLURM's mpich1 plugin is designed to start one task per node. The patched mpich1 library starts a number of tasks on that node based upon SLURM environment variables, so that seems to be where the problem is. Please respond with details and/or a patch when you resolve this.
Quoting Taras Shapovalov <taras.shapova...@brightcomputing.com>: > > Dear developers, > > When I use mpich1 with SLURM, then just 1 task is executed: > > [taras@ts-sl6slurm ~]$ srun -v --nodes=1 --tasks-per-node=2 > --mpi=mpich1_p4 hello_mpich1 > srun: auth plugin for Munge (http://home.gna.org/munge/) loaded > srun: Waiting for nodes to boot > srun: Nodes node001 are ready for job > srun: jobid 2995: nodes(1):`node001', cpu counts: 2(x1) > srun: switch NONE plugin loaded > srun: launching 2995.0 on host node001, 1 tasks: 0 > srun: Node node001, 1 tasks started > Hello MPI! Process 0 of 1 > srun: Received task exit notification for 1 task (status=0x0000). > srun: node001: task 0: Completed > [taras@ts-sl6slurm ~]$ > > So, it looks like just 1 task per node is created. > In case of openmpi it works fine: it creates 2 tasks per node. > > Our MPICH1 is patched (!) with ./contribs/mpich1.slurm.patch and was > built with the next parameters: > > [taras@ts-sl6slurm ~]$ mpichversion > MPICH Version: 1.2.7 > MPICH Release date: $Date: 2005/06/22 16:33:49$ > MPICH Patches applied: none > MPICH configure: --enable-cxx --with-romio --with-device=ch_p4 > --p4_opts=--enable-processgroup=no --enable-sharedlib --with-comm=shared > --disable-devdebug -lib= --with-arch=LINUX > MPICH Device: ch_p4 > > Checked for both SLURM 2.2.7 and 2.3.3. > What I missed? > > -- > Taras >