I confirmed that things are working as intended. If you have 12 cores on a 
machine, and you do

mpirun -map-by socket:PE=2 <foo>

we will execute 6 copies of foo on the node because 12 cores/2pe/core = 6 procs.

As I said, we believe the prior series were doing this incorrectly, and the 
patch used on the 1.8 series has corrected the situation.


> On Nov 3, 2014, at 8:23 AM, Ralph Castain <rhc.open...@gmail.com> wrote:
> 
> 
>> On Nov 3, 2014, at 4:54 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote:
>> 
>> Hi there,
>> 
>> We've started looking at moving to the openmpi 1.8 branch from 1.6 on our 
>> CentOS6/Son of Grid Engine cluster and noticed an unexpected difference when 
>> binding multiple cores to each rank.
>> 
>> Has openmpi's definition 'slot' changed between 1.6 and 1.8? It used to mean 
>> ranks, but now it appears to mean processing elements (see Details, below).
> 
> It actually didn’t change - there were some errors in prior versions in how 
> we were handling things, and we have corrected them. A “slot” never was 
> equated to an MPI rank, but is an allocation from the scheduler - it means 
> you have been allocated one resource on the given node. So the number of 
> “slots” on a node equates to the number of resources on that node which were 
> allocated for your use.
> 
> Note also that a “slot” doesn’t automatically correspond to a core - someone 
> may well decide to define a “slot” as being the equivalent of a “container” 
> comprised of several cores. It is indeed an abstraction used by the scheduler 
> when assigning resources.
> 
> Because of the confusion we’ve encountered both internally and externally 
> over the meaning of the term “cpu", we adopted the “processing element” term. 
> So if you are individually assigning hwthreads, your PE is at the hwthread 
> level. If you individually assign cores, then PE equates to core.
> 
> We know the mpirun man page in 1.8.3 was woefully out-of-date, and that has 
> been fixed for the soon-to-be-release 1.8.4. Some of the options that were 
> supposed to be deprecated (a) were accidentally turned completely off, and 
> (b) have been restored (and “un-deprecated”) per user request. So bysocket 
> will indeed return in 1.8.4
> 
> If you only have one allocated PE on a node, then mpirun will correctly tell 
> you that it can’t launch with PE>1 as there aren’t enough resources to meet 
> your request. IIRC, we may have been ignoring this under SGE and running as 
> many procs as we wanted on an allocated node - the SGE folks provided a patch 
> to fix that hole.
> 
> I’ll check the case you describe below - if you don’t specify the number of 
> procs to run, we should correctly resolve the number of ranks to start.
> 
>> 
>> Thanks,
>> 
>> Mark
>> 
>> PS Also, the man page for 1.8.3 reports that '--bysocket' is deprecated, but 
>> it doesn't seem to exist when we try to use it:
>> 
>> mpirun: Error: unknown option "-bysocket"
>> Type 'mpirun --help' for usage.
>> 
>> ====== Details ======
>> 
>> On 1.6.5, we launch with the following core binding options:
>> 
>> mpirun --bind-to-core --cpus-per-proc <n> <program>
>> mpirun --bind-to-core --bysocket --cpus-per-proc <n> <program>
>> 
>> where <n> is calculated to maximise the number of cores available to
>> use - I guess affectively
>> max(1, int(number of cores per node / slots per node requested)).
>> 
>> openmpi reads the file $PE_HOSTFILE and launches a rank for each slot
>> defined in it, binding <n> cores per rank.
>> 
>> On 1.8.3, we've tried launching with the following core binding options 
>> (which we hoped were equivalent):
>> 
>> mpirun -map-by node:PE=<n> <program>
>> mpirun -map-by socket:PE=<n> <program>
>> 
>> openmpi reads the file $PE_HOSTFILE and launches a factor of <n> fewer
>> ranks than under 1.6.5. We also notice that, where we wanted a single
>> rank on the box and <n> is the number of cores available, openmpi
>> refuses to launch and we get the message:
>> 
>> "There are not enough slots available in the system to satisfy the 1
>> slots that were requested by the application"
>> 
>> I think that error message needs a little work :)
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Searchable archives: 
>> http://www.open-mpi.org/community/lists/users/2014/11/index.php
> 

Reply via email to