I tried the following with OpenMPI 1.8.1 and 1.10.1. The both worked. In my case a node has 2 sockets like yours, but each socket has 12 cores and lstopo showed core numbers for the second socket are from 12 to 23.
* mpirun --report-bindings --bind-to core --cpu-set 12,13,14,15,16,17,18,19 -np 8 java Hello* [j-049:182867] MCW rank 0 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../../../../../..][BB/../../../../../../../../../../..] [j-049:182867] MCW rank 1 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][../BB/../../../../../../../../../..] [j-049:182867] MCW rank 2 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/../../../../../../../../..] [j-049:182867] MCW rank 3 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../../BB/../../../../../../../..] [j-049:182867] MCW rank 4 bound to socket 1[core 16[hwt 0-1]]: [../../../../../../../../../../../..][../../../../BB/../../../../../../..] [j-049:182867] MCW rank 5 bound to socket 1[core 17[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../BB/../../../../../..] [j-049:182867] MCW rank 6 bound to socket 1[core 18[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../../BB/../../../../..] [j-049:182867] MCW rank 7 bound to socket 1[core 19[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../../../BB/../../../..] On Mon, Dec 21, 2015 at 11:40 AM, Matt Thompson <fort...@gmail.com> wrote: > Ralph, > > Huh. That isn't in the Open MPI 1.8.8 mpirun man page. It is in Open MPI > 1.10, so I'm guessing someone noticed it wasn't there. Explains why I > didn't try it out. I'm assuming this option is respected on all nodes? > > Note: a SmarterManThanI™ here at Goddard thought up this: > > #!/bin/bash > rank=0 > for node in $(srun uname -n | sort); do > echo "rank $rank=$node slots=1:*" > let rank+=1 > done > > It does seem to work in synthetic tests so I'm trying it now in my real > job. I had to hack a few run scripts so I'll probably spend the next hour > debugging something dumb I did. > > What I'm wondering about all this is: can this be done with --slot-list? > Or, perhaps, does --slot-list even work? > > I have tried about 20 different variations of it, e.g., --slot-list 1:*, > --slot-list '1:*', --slot-list 1:0,1,2,3,4,5,6,7, --slot-list > 1:8,9,10,11,12,13,14,15, --slot-list 8-15, &c., and every time I seem to > trigger an error via help-rmaps_rank_file.txt. I tried to read > through opal_hwloc_base_slot_list_parse in the source, but my C isn't great > (see my gmail address name) so that didn't help. Might not even be the > right function, but I was just acking the code. > > Thanks, > Matt > > > On Mon, Dec 21, 2015 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> Try adding —cpu-set a,b,c,… where the a,b,c… are the core id’s of your >> second socket. I’m working on a cleaner option as this has come up before. >> >> >> On Dec 21, 2015, at 5:29 AM, Matt Thompson <fort...@gmail.com> wrote: >> >> Dear Open MPI Gurus, >> >> I'm currently trying to do something with Open MPI 1.8.8 that I'm pretty >> sure is possible, but I'm just not smart enough to figure out. Namely, I'm >> seeing some odd GPU timings and I think it's because I was dumb and assumed >> the GPU was on the PCI bus next to Socket #0 as some older GPU nodes I ran >> on were like that. >> >> But, a trip through lspci and lstopo has shown me that the GPU is >> actually on Socket #1. These are dual socket Sandy Bridge nodes and I'd >> like to do some tests where I run a 8 processes per node and those >> processes all land on Socket #1. >> >> So, what I'm trying to figure out is how to have Open MPI bind processes >> like that. My first thought as always is to run a helloworld job with >> -report-bindings on. I can manage to do this: >> >> (1061) $ mpirun -np 8 -report-bindings -map-by core ./helloWorld.exe >> [borg01z205:16306] MCW rank 4 bound to socket 0[core 4[hwt 0]]: >> [././././B/././.][./././././././.] >> [borg01z205:16306] MCW rank 5 bound to socket 0[core 5[hwt 0]]: >> [./././././B/./.][./././././././.] >> [borg01z205:16306] MCW rank 6 bound to socket 0[core 6[hwt 0]]: >> [././././././B/.][./././././././.] >> [borg01z205:16306] MCW rank 7 bound to socket 0[core 7[hwt 0]]: >> [./././././././B][./././././././.] >> [borg01z205:16306] MCW rank 0 bound to socket 0[core 0[hwt 0]]: >> [B/././././././.][./././././././.] >> [borg01z205:16306] MCW rank 1 bound to socket 0[core 1[hwt 0]]: >> [./B/./././././.][./././././././.] >> [borg01z205:16306] MCW rank 2 bound to socket 0[core 2[hwt 0]]: >> [././B/././././.][./././././././.] >> [borg01z205:16306] MCW rank 3 bound to socket 0[core 3[hwt 0]]: >> [./././B/./././.][./././././././.] >> Process 7 of 8 is on borg01z205 >> Process 5 of 8 is on borg01z205 >> Process 2 of 8 is on borg01z205 >> Process 3 of 8 is on borg01z205 >> Process 4 of 8 is on borg01z205 >> Process 6 of 8 is on borg01z205 >> Process 0 of 8 is on borg01z205 >> Process 1 of 8 is on borg01z205 >> >> Great...but wrong socket! Is there a way to tell it to use Socket 1 >> instead? >> >> Note I'll be running under SLURM, so I will only have 8 processes per >> node, so it shouldn't need to use Socket 0. >> -- >> Matt Thompson >> >> Man Among Men >> Fulcrum of History >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/12/28190.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/12/28195.php >> > > > > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/12/28196.php > -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington Cell 812-391-4914 http://saliya.org