Hi - are there any examples of the cartofile format? Or is there some combo of
—map, —rank, or —bind to achieve this mapping?
[BB/..][../..]
[../BB][../..]
[../..][BB/..]
[../..][../BB]
I tried everything I could think of for —bind-to, —map-by, and —rank-by, and I
can’t get it to happen. I can
I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s help
or man page. Are you seeing it somewhere?
On Jun 21, 2019, at 12:43 PM, Noam Bernstein via users
mailto:users@lists.open-mpi.org> > wrote:
Hi - are there any examples of the cartofile format? Or is there some comb
> On Jun 21, 2019, at 4:04 PM, Ralph Castain via users
> wrote:
>
> I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s
> help or man page. Are you seeing it somewhere?
From "mpirun —help”:
tin 1431 : mpirun --help mapping
mpirun (Open MPI) 4.0.1
Usage: mpirun [OPTION]
Hilarious - I wrote that code and I have no idea who added that option or what
it is supposed to do. I can assure, however, that it isn’t implemented anywhere.
Perhaps if you tell us what pattern you are trying to get, we can advise you on
the proper cmd line to get there?
On Jun 21, 2019, at
> On Jun 21, 2019, at 4:45 PM, Ralph Castain wrote:
>
> Hilarious - I wrote that code and I have no idea who added that option or
> what it is supposed to do. I can assure, however, that it isn’t implemented
> anywhere.
Not really a big deal, since the documentation doesn’t explain them, and
On Jun 21, 2019, at 1:52 PM, Noam Bernstein mailto:noam.bernst...@nrl.navy.mil> > wrote:
On Jun 21, 2019, at 4:45 PM, Ralph Castain mailto:r...@open-mpi.org> > wrote:
Hilarious - I wrote that code and I have no idea who added that option or what
it is supposed to do. I can assure, however, tha
> On Jun 21, 2019, at 5:02 PM, Ralph Castain wrote:
>
>
>
> Too many emails to track :-(
>
> Should just be “--map-by core --rank-by core” - nothing fancy required.
> Sounds like you are getting --map-by node, or at least --rank-by node, which
> means somebody has set an MCA param either in
Perhaps I spoke too soon. Now, with the Mellanox OFED stack, we occasionally
get the following failure on exit:
[compute-4-20:68008:0:68008] Caught signal 11 (Segmentation fault: address not
mapped to object at address 0x10)
0 0x0002a3c5 opal_free_list_destruct() opal_free_list.c:0
1 0x
> On Jun 21, 2019, at 9:57 PM, Carlson, Timothy S
> wrote:
>
> Switch back to stock OFED?
Well, CentOS included OFED has a memory leak (at least when using ucx). I
haven't tried OFED's stack yet.
>
> Make sure all your cards are patched to the latest firmware.
That's a good idea. I'