SLURM's mpich1 plugin is designed to start one task per node. The  
patched mpich1 library starts a number of tasks on that node based  
upon SLURM environment variables, so that seems to be where the  
problem is. Please respond with details and/or a patch when you  
resolve this.

Quoting Taras Shapovalov <taras.shapova...@brightcomputing.com>:

>
> Dear developers,
>
> When I use mpich1 with SLURM, then just 1 task is executed:
>
> [taras@ts-sl6slurm ~]$ srun -v --nodes=1 --tasks-per-node=2
> --mpi=mpich1_p4 hello_mpich1
> srun: auth plugin for Munge (http://home.gna.org/munge/) loaded
> srun: Waiting for nodes to boot
> srun: Nodes node001 are ready for job
> srun: jobid 2995: nodes(1):`node001', cpu counts: 2(x1)
> srun: switch NONE plugin loaded
> srun: launching 2995.0 on host node001, 1 tasks: 0
> srun: Node node001, 1 tasks started
> Hello MPI! Process 0 of 1
> srun: Received task exit notification for 1 task (status=0x0000).
> srun: node001: task 0: Completed
> [taras@ts-sl6slurm ~]$
>
> So, it looks like just 1 task per node is created.
> In case of openmpi it works fine: it creates 2 tasks per node.
>
> Our MPICH1 is patched (!) with ./contribs/mpich1.slurm.patch and was
> built with the next parameters:
>
> [taras@ts-sl6slurm ~]$ mpichversion
> MPICH Version:        1.2.7
> MPICH Release date:    $Date: 2005/06/22 16:33:49$
> MPICH Patches applied:    none
> MPICH configure:     --enable-cxx --with-romio --with-device=ch_p4
> --p4_opts=--enable-processgroup=no --enable-sharedlib --with-comm=shared
> --disable-devdebug -lib= --with-arch=LINUX
> MPICH Device:        ch_p4
>
> Checked for both SLURM 2.2.7 and 2.3.3.
> What I missed?
>
> --
> Taras
>

Reply via email to