Just an update for the list. Really only impacts folks running Open MPI
under LSF.


The LSB_PJL_TASK_GEOMETRY changes what lbs_getalloc() returns regarding the
allocation. It adjusts it to the mapping/ordering specified in that
environment variable. However, since it is not set by LSF when the job
starts the LSB_AFFINITY_HOSTFILE will show a broader mapping/ordering. The
difference between these two requests is the core of the problem here.

Consider an LSB hostfile with the following:
=== LSB_AFFINITY_HOSTFILE ===
p10a33 0,1,2,3,4,5,6,7
p10a33 8,9,10,11,12,13,14,15
p10a33 16,17,18,19,20,21,22,23
p10a30 0,1,2,3,4,5,6,7
p10a30 8,9,10,11,12,13,14,15
p10a30 16,17,18,19,20,21,22,23
p10a58 0,1,2,3,4,5,6,7
p10a58 8,9,10,11,12,13,14,15
p10a58 16,17,18,19,20,21,22,23
=============================

This tells Open MPI to launch 3 processes per node with a particular set of
bindings - so 9 processes total.

export LSB_PJL_TASK_GEOMETRY="{(5)(4,3)(2,1,0)}"

The LSB_PJL_TASK_GEOMETRY variable (above) tells us to only launch 6
processes. So lbs_getalloc() will return to us (ras_lsf_module.c) a list of
resources that match launching 6 processes. However, when we go to the
rmaps_seq.c we tell it to pay attention to the LSB_AFFINITY_HOSTFILE. So it
tries to map 9 processes even though we set the slots on the nodes to be a
total of 6. So eventually we get an oversubscription issue.

Interesting difference between 1.10.2 and 1.10.3rc1 - using the
LSB_AFFINITY_HOSTFILE, seen above.
In 1.10.2 RAS thinks it has the following allocation (with and without the
LSB_PJL_TASK_GEOMETRY set):
======================   ALLOCATED NODES   ======================
        p10a33: slots=1 max_slots=0 slots_inuse=0 state=UP
=================================================================
In 1.10.3.rc1 RAS thinks it has the following allocation (with the
LSB_PJL_TASK_GEOMETRY set)
======================   ALLOCATED NODES   ======================
        p10a33: slots=1 max_slots=0 slots_inuse=0 state=UP
        p10a30: slots=2 max_slots=0 slots_inuse=0 state=UP
        p10a58: slots=3 max_slots=0 slots_inuse=0 state=UP
=================================================================
In 1.10.3.rc1 RAS thinks it has the following allocation (without the
LSB_PJL_TASK_GEOMETRY set)
======================   ALLOCATED NODES   ======================
        p10a33: slots=3 max_slots=0 slots_inuse=0 state=UP
        p10a30: slots=3 max_slots=0 slots_inuse=0 state=UP
        p10a58: slots=3 max_slots=0 slots_inuse=0 state=UP
=================================================================

The 1.10.3rc1 behavior is what I would expect to happen. The 1.10.2
behavior seems to be a bug when running under LSF.

The original error comes from trying to map 3 process on each of the nodes
(since the affinity file wants to launch 9 processes), but the nodes having
a more restricted set of slots (Due to the LSB_PJL_TASK_GEOMETRY variable).


I know a number of things have changed from 1.10.2 to 1.10.3 regarding how
we allocate/map. Ralph, do you know offhand what might have caused this
difference? It's not a big deal if not, just curious.


I'm working with Farid on some options to work around the issue for 1.10.2.
Open MPI 1.10.3 seems to be ok for basic LSF functionality (without the
LSB_PJL_TASK_GEOMETRY variable).

-- Josh


On Tue, Apr 19, 2016 at 8:57 AM, Josh Hursey <jjhur...@open-mpi.org> wrote:

> Farid,
>
> I have access to the same cluster inside IBM. I can try to help you track
> this down and maybe work up a patch with the LSF folks. I'll contact you
> off-list with my IBM address and we can work on this a bit.
>
> I'll post back to the list with what we found.
>
> -- Josh
>
>
> On Tue, Apr 19, 2016 at 5:06 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> On Apr 18, 2016, at 7:08 PM, Farid Parpia <par...@us.ibm.com> wrote:
>> >
>> > I will try to put you in touch with someone in LSF development
>> immediately.
>>
>> FWIW: It would be great if IBM could contribute the fixes to this.  None
>> of us have access to LSF resources, and IBM is a core contributor to Open
>> MPI.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28963.php
>>
>
>

Reply via email to