I tried the following with OpenMPI 1.8.1 and 1.10.1. The both worked. In my
case a node has 2 sockets like yours, but each socket has 12 cores and
lstopo showed core numbers for the second socket are from 12 to 23.

* mpirun --report-bindings --bind-to core --cpu-set 12,13,14,15,16,17,18,19
-np 8 java Hello*

[j-049:182867] MCW rank 0 bound to socket 1[core 12[hwt 0-1]]:
[../../../../../../../../../../../..][BB/../../../../../../../../../../..]
[j-049:182867] MCW rank 1 bound to socket 1[core 13[hwt 0-1]]:
[../../../../../../../../../../../..][../BB/../../../../../../../../../..]
[j-049:182867] MCW rank 2 bound to socket 1[core 14[hwt 0-1]]:
[../../../../../../../../../../../..][../../BB/../../../../../../../../..]
[j-049:182867] MCW rank 3 bound to socket 1[core 15[hwt 0-1]]:
[../../../../../../../../../../../..][../../../BB/../../../../../../../..]
[j-049:182867] MCW rank 4 bound to socket 1[core 16[hwt 0-1]]:
[../../../../../../../../../../../..][../../../../BB/../../../../../../..]
[j-049:182867] MCW rank 5 bound to socket 1[core 17[hwt 0-1]]:
[../../../../../../../../../../../..][../../../../../BB/../../../../../..]
[j-049:182867] MCW rank 6 bound to socket 1[core 18[hwt 0-1]]:
[../../../../../../../../../../../..][../../../../../../BB/../../../../..]
[j-049:182867] MCW rank 7 bound to socket 1[core 19[hwt 0-1]]:
[../../../../../../../../../../../..][../../../../../../../BB/../../../..]



On Mon, Dec 21, 2015 at 11:40 AM, Matt Thompson <fort...@gmail.com> wrote:

> Ralph,
>
> Huh. That isn't in the Open MPI 1.8.8 mpirun man page. It is in Open MPI
> 1.10, so I'm guessing someone noticed it wasn't there. Explains why I
> didn't try it out. I'm assuming this option is respected on all nodes?
>
> Note: a SmarterManThanI™ here at Goddard thought up this:
>
> #!/bin/bash
> rank=0
> for node in $(srun uname -n | sort); do
>         echo "rank $rank=$node slots=1:*"
>         let rank+=1
> done
>
> It does seem to work in synthetic tests so I'm trying it now in my real
> job. I had to hack a few run scripts so I'll probably spend the next hour
> debugging something dumb I did.
>
> What I'm wondering about all this is: can this be done with --slot-list?
> Or, perhaps, does --slot-list even work?
>
> I have tried about 20 different variations of it, e.g., --slot-list 1:*,
> --slot-list '1:*', --slot-list 1:0,1,2,3,4,5,6,7, --slot-list
> 1:8,9,10,11,12,13,14,15, --slot-list 8-15, &c., and every time I seem to
> trigger an error via help-rmaps_rank_file.txt. I tried to read
> through opal_hwloc_base_slot_list_parse in the source, but my C isn't great
> (see my gmail address name) so that didn't help. Might not even be the
> right function, but I was just acking the code.
>
> Thanks,
> Matt
>
>
> On Mon, Dec 21, 2015 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Try adding —cpu-set a,b,c,…  where the a,b,c… are the core id’s of your
>> second socket. I’m working on a cleaner option as this has come up before.
>>
>>
>> On Dec 21, 2015, at 5:29 AM, Matt Thompson <fort...@gmail.com> wrote:
>>
>> Dear Open MPI Gurus,
>>
>> I'm currently trying to do something with Open MPI 1.8.8 that I'm pretty
>> sure is possible, but I'm just not smart enough to figure out. Namely, I'm
>> seeing some odd GPU timings and I think it's because I was dumb and assumed
>> the GPU was on the PCI bus next to Socket #0 as some older GPU nodes I ran
>> on were like that.
>>
>> But, a trip through lspci and lstopo has shown me that the GPU is
>> actually on Socket #1. These are dual socket Sandy Bridge nodes and I'd
>> like to do some tests where I run a 8 processes per node and those
>> processes all land on Socket #1.
>>
>> So, what I'm trying to figure out is how to have Open MPI bind processes
>> like that. My first thought as always is to run a helloworld job with
>> -report-bindings on. I can manage to do this:
>>
>> (1061) $ mpirun -np 8 -report-bindings -map-by core ./helloWorld.exe
>> [borg01z205:16306] MCW rank 4 bound to socket 0[core 4[hwt 0]]:
>> [././././B/././.][./././././././.]
>> [borg01z205:16306] MCW rank 5 bound to socket 0[core 5[hwt 0]]:
>> [./././././B/./.][./././././././.]
>> [borg01z205:16306] MCW rank 6 bound to socket 0[core 6[hwt 0]]:
>> [././././././B/.][./././././././.]
>> [borg01z205:16306] MCW rank 7 bound to socket 0[core 7[hwt 0]]:
>> [./././././././B][./././././././.]
>> [borg01z205:16306] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>> [B/././././././.][./././././././.]
>> [borg01z205:16306] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
>> [./B/./././././.][./././././././.]
>> [borg01z205:16306] MCW rank 2 bound to socket 0[core 2[hwt 0]]:
>> [././B/././././.][./././././././.]
>> [borg01z205:16306] MCW rank 3 bound to socket 0[core 3[hwt 0]]:
>> [./././B/./././.][./././././././.]
>> Process    7 of    8 is on borg01z205
>> Process    5 of    8 is on borg01z205
>> Process    2 of    8 is on borg01z205
>> Process    3 of    8 is on borg01z205
>> Process    4 of    8 is on borg01z205
>> Process    6 of    8 is on borg01z205
>> Process    0 of    8 is on borg01z205
>> Process    1 of    8 is on borg01z205
>>
>> Great...but wrong socket! Is there a way to tell it to use Socket 1
>> instead?
>>
>> Note I'll be running under SLURM, so I will only have 8 processes per
>> node, so it shouldn't need to use Socket 0.
>> --
>> Matt Thompson
>>
>> Man Among Men
>> Fulcrum of History
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/12/28190.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/12/28195.php
>>
>
>
>
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/12/28196.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org

Reply via email to