I confirmed that things are working as intended. If you have 12 cores on a machine, and you do
mpirun -map-by socket:PE=2 <foo> we will execute 6 copies of foo on the node because 12 cores/2pe/core = 6 procs. As I said, we believe the prior series were doing this incorrectly, and the patch used on the 1.8 series has corrected the situation. > On Nov 3, 2014, at 8:23 AM, Ralph Castain <rhc.open...@gmail.com> wrote: > > >> On Nov 3, 2014, at 4:54 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: >> >> Hi there, >> >> We've started looking at moving to the openmpi 1.8 branch from 1.6 on our >> CentOS6/Son of Grid Engine cluster and noticed an unexpected difference when >> binding multiple cores to each rank. >> >> Has openmpi's definition 'slot' changed between 1.6 and 1.8? It used to mean >> ranks, but now it appears to mean processing elements (see Details, below). > > It actually didn’t change - there were some errors in prior versions in how > we were handling things, and we have corrected them. A “slot” never was > equated to an MPI rank, but is an allocation from the scheduler - it means > you have been allocated one resource on the given node. So the number of > “slots” on a node equates to the number of resources on that node which were > allocated for your use. > > Note also that a “slot” doesn’t automatically correspond to a core - someone > may well decide to define a “slot” as being the equivalent of a “container” > comprised of several cores. It is indeed an abstraction used by the scheduler > when assigning resources. > > Because of the confusion we’ve encountered both internally and externally > over the meaning of the term “cpu", we adopted the “processing element” term. > So if you are individually assigning hwthreads, your PE is at the hwthread > level. If you individually assign cores, then PE equates to core. > > We know the mpirun man page in 1.8.3 was woefully out-of-date, and that has > been fixed for the soon-to-be-release 1.8.4. Some of the options that were > supposed to be deprecated (a) were accidentally turned completely off, and > (b) have been restored (and “un-deprecated”) per user request. So bysocket > will indeed return in 1.8.4 > > If you only have one allocated PE on a node, then mpirun will correctly tell > you that it can’t launch with PE>1 as there aren’t enough resources to meet > your request. IIRC, we may have been ignoring this under SGE and running as > many procs as we wanted on an allocated node - the SGE folks provided a patch > to fix that hole. > > I’ll check the case you describe below - if you don’t specify the number of > procs to run, we should correctly resolve the number of ranks to start. > >> >> Thanks, >> >> Mark >> >> PS Also, the man page for 1.8.3 reports that '--bysocket' is deprecated, but >> it doesn't seem to exist when we try to use it: >> >> mpirun: Error: unknown option "-bysocket" >> Type 'mpirun --help' for usage. >> >> ====== Details ====== >> >> On 1.6.5, we launch with the following core binding options: >> >> mpirun --bind-to-core --cpus-per-proc <n> <program> >> mpirun --bind-to-core --bysocket --cpus-per-proc <n> <program> >> >> where <n> is calculated to maximise the number of cores available to >> use - I guess affectively >> max(1, int(number of cores per node / slots per node requested)). >> >> openmpi reads the file $PE_HOSTFILE and launches a rank for each slot >> defined in it, binding <n> cores per rank. >> >> On 1.8.3, we've tried launching with the following core binding options >> (which we hoped were equivalent): >> >> mpirun -map-by node:PE=<n> <program> >> mpirun -map-by socket:PE=<n> <program> >> >> openmpi reads the file $PE_HOSTFILE and launches a factor of <n> fewer >> ranks than under 1.6.5. We also notice that, where we wanted a single >> rank on the box and <n> is the number of cores available, openmpi >> refuses to launch and we get the message: >> >> "There are not enough slots available in the system to satisfy the 1 >> slots that were requested by the application" >> >> I think that error message needs a little work :) >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Searchable archives: >> http://www.open-mpi.org/community/lists/users/2014/11/index.php >