Just an update for the list. Really only impacts folks running Open MPI under LSF.
The LSB_PJL_TASK_GEOMETRY changes what lbs_getalloc() returns regarding the allocation. It adjusts it to the mapping/ordering specified in that environment variable. However, since it is not set by LSF when the job starts the LSB_AFFINITY_HOSTFILE will show a broader mapping/ordering. The difference between these two requests is the core of the problem here. Consider an LSB hostfile with the following: === LSB_AFFINITY_HOSTFILE === p10a33 0,1,2,3,4,5,6,7 p10a33 8,9,10,11,12,13,14,15 p10a33 16,17,18,19,20,21,22,23 p10a30 0,1,2,3,4,5,6,7 p10a30 8,9,10,11,12,13,14,15 p10a30 16,17,18,19,20,21,22,23 p10a58 0,1,2,3,4,5,6,7 p10a58 8,9,10,11,12,13,14,15 p10a58 16,17,18,19,20,21,22,23 ============================= This tells Open MPI to launch 3 processes per node with a particular set of bindings - so 9 processes total. export LSB_PJL_TASK_GEOMETRY="{(5)(4,3)(2,1,0)}" The LSB_PJL_TASK_GEOMETRY variable (above) tells us to only launch 6 processes. So lbs_getalloc() will return to us (ras_lsf_module.c) a list of resources that match launching 6 processes. However, when we go to the rmaps_seq.c we tell it to pay attention to the LSB_AFFINITY_HOSTFILE. So it tries to map 9 processes even though we set the slots on the nodes to be a total of 6. So eventually we get an oversubscription issue. Interesting difference between 1.10.2 and 1.10.3rc1 - using the LSB_AFFINITY_HOSTFILE, seen above. In 1.10.2 RAS thinks it has the following allocation (with and without the LSB_PJL_TASK_GEOMETRY set): ====================== ALLOCATED NODES ====================== p10a33: slots=1 max_slots=0 slots_inuse=0 state=UP ================================================================= In 1.10.3.rc1 RAS thinks it has the following allocation (with the LSB_PJL_TASK_GEOMETRY set) ====================== ALLOCATED NODES ====================== p10a33: slots=1 max_slots=0 slots_inuse=0 state=UP p10a30: slots=2 max_slots=0 slots_inuse=0 state=UP p10a58: slots=3 max_slots=0 slots_inuse=0 state=UP ================================================================= In 1.10.3.rc1 RAS thinks it has the following allocation (without the LSB_PJL_TASK_GEOMETRY set) ====================== ALLOCATED NODES ====================== p10a33: slots=3 max_slots=0 slots_inuse=0 state=UP p10a30: slots=3 max_slots=0 slots_inuse=0 state=UP p10a58: slots=3 max_slots=0 slots_inuse=0 state=UP ================================================================= The 1.10.3rc1 behavior is what I would expect to happen. The 1.10.2 behavior seems to be a bug when running under LSF. The original error comes from trying to map 3 process on each of the nodes (since the affinity file wants to launch 9 processes), but the nodes having a more restricted set of slots (Due to the LSB_PJL_TASK_GEOMETRY variable). I know a number of things have changed from 1.10.2 to 1.10.3 regarding how we allocate/map. Ralph, do you know offhand what might have caused this difference? It's not a big deal if not, just curious. I'm working with Farid on some options to work around the issue for 1.10.2. Open MPI 1.10.3 seems to be ok for basic LSF functionality (without the LSB_PJL_TASK_GEOMETRY variable). -- Josh On Tue, Apr 19, 2016 at 8:57 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Farid, > > I have access to the same cluster inside IBM. I can try to help you track > this down and maybe work up a patch with the LSF folks. I'll contact you > off-list with my IBM address and we can work on this a bit. > > I'll post back to the list with what we found. > > -- Josh > > > On Tue, Apr 19, 2016 at 5:06 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> On Apr 18, 2016, at 7:08 PM, Farid Parpia <par...@us.ibm.com> wrote: >> > >> > I will try to put you in touch with someone in LSF development >> immediately. >> >> FWIW: It would be great if IBM could contribute the fixes to this. None >> of us have access to LSF resources, and IBM is a core contributor to Open >> MPI. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28963.php >> > >