hen consider cpus? From the manpages I thought this is
the default behaviour.
By the way, if I'll manage to understand everything correctly, I can also
contribute to fix these inconsistencies in the manpages. I'd be more than happy
to help where I can.
On 03.02
nkfile and the allocation.txt file).
>
> I was wondering if somehow mpirun cannot find all the hosts sometimes (but
> sometimes it can, so it's a mistery to me)?
>
> Just wanted to point that out. Now I'll get in touch with the cluster support
> to see if it's po
Are you willing to try this with OMPI master? Asking because it would be hard
to push changes all the way back to 4.0.x every time we want to see if we fixed
something.
Also, few of us have any access to LSF, though I doubt that has much impact
here as it sounds like the issue is in the rank_fi
Errr...what version OMPI are you using?
> On Feb 2, 2022, at 3:03 PM, David Perozzi via users
> wrote:
>
> Helo,
>
> I'm trying to run a code implemented with OpenMPI and OpenMP (for threading)
> on a large cluster that uses LSF for the job scheduling and dispatch. The
> problem with LSF is
ed natively by Open MPI or abstraction layers) and/or with
an uncommon topology (for which collective communications are not fully
optimized by Open MPI). In the latter case, using the system/vendor MPI is the
best option performance wise.
Cheers,
Gilles
On Fri, Jan 28, 2022 at 2:23 AM Ralph Casta
share.net/rcastain/pmix-bridging-the-container-boundary
<https://www.slideshare.net/rcastain/pmix-bridging-the-container-boundary>
[video]
https://www.sylabs.io/2019/04/sug-talk-intels-ralph-castain-on-bridging-the-container-boundary-with-pmix/
<https://www.sylabs.io/2019/04/sug-talk-
Just to complete this - there is always a lingering question regarding shared
memory support. There are two ways to resolve that one:
* run one container per physical node, launching multiple procs in each
container. The procs can then utilize shared memory _inside_ the container.
This is the c
> Fair enough Ralph! I was implicitly assuming a "build once / run everywhere"
> use case, my bad for not making my assumption clear.
> If the container is built to run on a specific host, there are indeed other
> options to achieve near native performances.
>
Err...that isn't actually what I m
gine the past issues I'd experienced
were just due to the PMI differences in the different MPI implementations at
the time. I owe you a beer or something at the next in-person SC conference!
Cheers,
- Brian
On Wed, Jan 26, 2022 at 4:54 PM Ralph Castain via users
mailto:users@lists.open
il.com> > wrote:
Hi Ralph,
My singularity image has OpenMPI, but my host doesnt (Intel MPI). And I am not
sure if I the system would work with Intel + OpenMPI.
Luis
Enviado do Email <https://go.microsoft.com/fwlink/?LinkId=550986> para Windows
De: Ralph Castain via users <ma
My singularity image has OpenMPI, but my host doesnt (Intel MPI). And I am not
sure if I the system would work with Intel + OpenMPI.
Luis
Enviado do Email <https://go.microsoft.com/fwlink/?LinkId=550986> para Windows
De: Ralph Castain via users <mailto:users@lists.open-mpi.org>
En
Err...the whole point of a container is to put all the library dependencies
_inside_ it. So why don't you just install OMPI in your singularity image?
On Jan 26, 2022, at 6:42 AM, Luis Alfredo Pires Barbosa via users
mailto:users@lists.open-mpi.org> > wrote:
Hello all,
I have Intel MPI in my
Short answer is yes, but it it a bit complicated to do.
On Jan 25, 2022, at 12:28 PM, Saliya Ekanayake via users
mailto:users@lists.open-mpi.org> > wrote:
Hi,
I am trying to run an MPI program on a platform that launches the processes
using a custom launcher (not mpiexec). This will end up spa
ons:
>> > - use mpirun
>> > - rebuild Open MPI with PMI support as Ralph previously explained
>> > - use SLURM PMIx:
>> > srun --mpi=list
>> > will list the PMI flavors provided by SLURM
>> > a) if
-fPIC -c99
>> -tp p7-64' 'CXXFLAGS=-O1 -fPIC -tp p7-64' 'FCFLAGS=-O1 -fPIC -tp p7-64'
>> 'LD=ld' '--enable-shared' '--enable-static' '--without-tm'
>> '--enable-mpi-cxx' '--disable-wrapper-runpath'
&g
If you look at your configure line, you forgot to include
--with-pmi=. We don't build the Slurm PMI support by
default due to the GPL licensing issues - you have to point at it.
> On Jan 24, 2022, at 6:41 AM, Matthias Leopold via users
> wrote:
>
> Hi,
>
> we have 2 DGX A100 machines and I'
ile generated by
> PBS, before passing it to mpirun), but it's pity that tm support is
> not included in these pre-built OpenMPI installations.
>
> On Tue, Jan 18, 2022 at 11:56 PM Ralph Castain via users
> wrote:
>>
>> Hostfile isn't being ignored - it is doin
foo
> one should use:
> mpirun -n 2 --host node1,node2 ./foo
>
> Rather strange, but it's important that it works somehow. Thanks for your
> help!
>
> On Tue, Jan 18, 2022 at 10:54 PM Ralph Castain via users
> wrote:
>>
>> Are you launching the
; wrote:
>
> I have one process per node, here is corresponding line from my job
> submission script (with compute nodes named "node1" and "node2"):
>
> #PBS -l select=1:ncpus=1:mpiprocs=1:host=node1+1:ncpus=1:mpiprocs=1:host=node2
>
> On Tue, Jan
Afraid I can't understand your scenario - when you say you "submit a job" to
run on two nodes, how many processes are you running on each node??
> On Jan 18, 2022, at 1:07 PM, Crni Gorac via users
> wrote:
>
> Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and
> have PBS 1
FWIW: this has been "fixed" in PMIx/PRRTE and should make it into OMPI v5 if
the OMPI community accepts it. The default behavior has been changed to output
a full line-at-a-time so that the output from different ranks doesn't get mixed
together. The negative to this, of course, is that we now in
There are several output-controlling options - e.g., you could redirect the
output from each process to its own file or directory.
However, it makes little sense to me for someone to write convergence data into
a file and then parse it. Typically, convergence data results from all procs
reachin
argv_, 1, info, rank_,
MPI_COMM_SELF, &intercom, error_codes);
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Friday, November 5, 2021 9:50 AM
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Ralph Castain mailto:r...@open-
--map-by node \
-np 21 \
-wdir ${work_dir} …
Here is my qsub command for the program “Needles”.
qsub -V -j oe -e $tmpdir_stdio -o $tmpdir_stdio -f -X -N Needles -l
nodes=21:ppn=9 RunNeedles.bash;
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of
th-tm. I tried Gilles workaround but the failure still
occurred. What do I need to provide you so that you can investigate this
possible bug?
Thanks,
Kurt
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Wednesday, November 3, 2021 8:45
Sounds like a bug to me - regardless of configuration, if the hostfile contains
an entry for each slot on a node, OMPI should have added those up.
On Nov 3, 2021, at 2:49 AM, Gilles Gouaillardet via users
mailto:users@lists.open-mpi.org> > wrote:
Kurt,
Assuming you built Open MPI with tm supp
d that? Thanks.
>Ray
>
>
> From: users on behalf of Ralph Castain via
> users
> Sent: Monday, October 11, 2021 1:49 PM
> To: Open MPI Users
> Cc: Ralph Castain
> Subject: Re: [OMPI users] [External] Re: cpu bi
via users mailto:users@lists.open-mpi.org> > wrote:
OK thank you. Seems that srun is a better option for normal users.
Chang
On 10/11/21 1:23 PM, Ralph Castain via users wrote:
Sorry, your output wasn't clear about cores vs hwthreads. Apparently, your
Slurm config is setup to use hwt
al cores, so two processes sharing a
physical core.
I guess there is a way to do that by playing with mapping. I just want to know
if this is a bug in mpirun, or this feature for interacting with slurm was
never implemented.
Chang
On 10/11/21 10:07 AM, Ralph Castain via users wrote:
You just n
You just need to tell mpirun that you want your procs to be bound to cores, not
socket (which is the default).
Add "--bind-to core" to your mpirun cmd line
On Oct 10, 2021, at 11:17 PM, Chang Liu via users mailto:users@lists.open-mpi.org> > wrote:
Yes they are. This is an interactive job from
Could you please include (a) what version of OMPI you are talking about, and
(b) the binding patterns you observed from both srun and mpirun?
> On Oct 9, 2021, at 6:41 PM, Chang Liu via users
> wrote:
>
> Hi,
>
> I wonder if mpirun can follow the cpu binding settings from slurm, when
> runn
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\of NJ| Office of Advanced Research Computing - MSB C630,
> Newark
> `'
>
>&g
ob step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: gpu004: tasks 0-1: Exited with exit code 1
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,
> |---*O*---
> ||_// the State|
Ryan - I suspect what Sergey was trying to say was that you need to ensure OMPI
doesn't try to use the OpenIB driver, or at least that it doesn't attempt to
initialize it. Try adding
OMPI_MCA_pml=ucx
to your environment.
On Jul 29, 2021, at 1:56 AM, Sergey Oblomov via users mailto:users@lists
You can still use "map-by" to get what you want since you know there are four
interfaces per node - just do "--map-by ppr:8:node". Note that you definitely
do NOT want to list those multiple IP addresses in your hostfile - all you are
doing is causing extra work for mpirun as it has to DNS resol
To answer your specific questions:
The backend daemons (orted) will not exit until all locally spawned procs exit.
This is not configurable - for one thing, OMPI procs will suicide if they see
the daemon depart, so it makes no sense to have the daemon fail if a proc
terminates. The logic behind
The original configure line is correct ("--without-orte") - just a typo in the
later text.
You may be running into some issues with Slurm's built-in support for OMPI. Try
running it with OMPI's "mpirun" instead and see if you get better performance.
You'll have to reconfigure to remove the "--w
I'm not sure we support what you are wanting to do.
You can direct mpiexec to use a specified script to launch its daemons on
remote nodes. The daemons will need to connect back via TCP to mpiexec. The
daemons are responsible for fork/exec'ing the local MPI application procs on
each node. Those
Hmmm...disturbing. The changes I made have somehow been lost. I'll have to redo
it - will get back to you when it is restored.
On Mar 25, 2021, at 2:54 PM, L Lutret mailto:lu.lut...@gmail.com> > wrote:
Hi Ralph,
Thanks for your response. I tried with the master branch a very simple spawn
from
Apologies for the very long delay in response. This has been verified fixed in
OMPI's master branch that is to be released as v5.0 in the near future.
Unfortunately, there are no plans to backport that fi to earlier release
series. We therefore recommend that you upgrade to v5.0 if you retain in
You did everything right - the OSHMEM implementation in OMPI only supports UCX
as it is essentially a Mellanox offering. I think the main impediment to
broadening it is simply interest and priority on the part of the non-UCX
developers.
> On Mar 22, 2021, at 7:51 AM, Michael Di Domenico via use
or available ports, but is it checking those
ports are also available on all the other hosts it’s going to run on?
On 18 Mar 2021, at 15:57, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
Hmmm...then you have something else going on. By default, OMPI will ask the
lo
(pure default), it just doesn’t function (I’m guessing
because it chose “bad” or in-use ports).
On 18 Mar 2021, at 14:11, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
Hard to say - unless there is some reason, why not make it large enough to not
be an issue? You
hat range resulted in
the issue I posted about here before, where mpirun just does nothing for 5mins
and then terminates itself, without any error messages.)
Cheers,
Sendu.
On 17 Mar 2021, at 13:25, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
What you are miss
What you are missing is that there are _two_ messaging layers in the system.
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one
unspecified. You need to add
oob_tcp_dynamic_ipv4_ports = 46207-46239
or whatever range you want to specify
Note that if you want the btl
Excuse me, but would you please ensure that you do not send mail to a mailing
list containing this label:
[AMD Official Use Only - Internal Distribution Only]
Thank you
Ralph
On Mar 4, 2021, at 4:55 AM, Raut, S Biplab via users mailto:users@lists.open-mpi.org> > wrote:
[AMD Official Use Only
other policies. I have also tried with
--cpu-set with identical results. Probably rankfile is my only option too.
On 28/02/2021 22:44, Ralph Castain via users wrote:
The only way I know of to do what you want is
--map-by ppr:32:socket --bind-to core --cpu-list 0,2,4,6,...
whe
So then I need a rankfile listing all the hosts?
John
On 3/1/21 10:26 AM, Ralph Castain via users wrote:
I'm afraid not - you have simply told us that all cpus are available. I don't
know of any way to accomplish what John wants other than with a rankfile.
On Mar 1, 2021,
one bound to one core,
and the second bound to all the rest, with no use of hyperthreads.
Would this be
--map-by ppr:2:node --bind-to core --cpu-list 0,1-31
?
Thx
On 2/28/21 5:44 PM, Ralph Castain via users wrote:
The only way I know of to do what you want is
--map-by pp
[../BB/../..
/../../../../../../../../../../../../../../../../../../../../../../../../../../.
./../../../../../../../../../../../../../../../../../../../../../../../../../../
../../../../../../..][../../../../../../../../../../../../../../../../../../../.
./../../../../../../../../../../../../../../../../../../../../../../../../../../
../../../../../../../../../../../../../../../../../..]
On 28/02/2021 16:24, Ralph Castain via users wrote:
Did you read the documentation on rankfile? The "slot=N" directive saids to
"put this proc on
../../../../../../../..]
And this is still different from the output produce using the rankfile.
Cheers,
Luis
On 28/02/2021 14:06, Ralph Castain via users wrote:
Your command line is incorrect:
--map-by ppr:32:socket:PE=4 --bind-to hwthread
Your command line is incorrect:
--map-by ppr:32:socket:PE=4 --bind-to hwthread
should be
--map-by ppr:32:socket:PE=2 --bind-to core
On Feb 28, 2021, at 5:57 AM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
I should have said, "I would like to run 128 MPI processes on 2
Okay, I can't promise when I'll get to it, but I'll try to have it in time for
OMPI v5.
On Jan 29, 2021, at 1:30 AM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
Hi Ralph,
It would be great to have it for load balancing issues. Ideally one could do
something like --
t work, and all the app-contexts wind up in MPI_COMM_WORLD.
On Jan 28, 2021, at 3:18 PM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
That's right Ralph!
On 28/01/2021 23:13, Ralph Castain via users wrote:
Trying to wrap my head around this, so let me try a 2-nod
Trying to wrap my head around this, so let me try a 2-node example. You want
(each rank bound to a single core):
ranks 0-3 to be mapped onto node1
ranks 4-7 to be mapped onto node2
ranks 8-11 to be mapped onto node1
ranks 12-15 to be mapped onto node2
etc.etc.
Correct?
> On Jan 28, 2021, at 3:0
There should have been an error message right above that - all this is saying
is that the same error message was output by 7 more processes besides the one
that was output. It then indicates that process 3 (which has pid 0?) was killed.
Looking at the help message tag, it looks like no NICs were
I think you mean add "--mca mtl ofi" to the mpirun cmd line
> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users
> wrote:
>
> What happens if you specify -mtl ofi ?
>
> -Original Message-
> From: users On Behalf Of Patrick Begou via
> users
> Sent: Monday, January 25, 20
You want to use the "sequential" mapper and then specify each proc's location,
like this for your hostfile:
host1
host1
host2
host2
host3
host3
host1
host2
host3
and then add "--mca rmaps seq" to your mpirun cmd line.
Ralph
On Dec 21, 2020, at 5:22 AM, Vineet Soni via users mailto:users@lists
Did you remember to build the Slurm pmi and pmi2 libraries? They aren't built
by default - IIRC, you have to manually go into a subdirectory and do a "make
install" to have them built and installed. You might check the Slurm
documentation for details.
You also might need to add a --with-pmi-li
Just a point to consider. OMPI does _not_ want to get in the mode of modifying
imported software packages. That is a blackhole of effort we simply cannot
afford.
The correct thing to do would be to flag Rob Latham on that PR and ask that he
upstream the fix into ROMIO so we can absorb it. We sh
That would be very kind of you and most welcome!
> On Nov 14, 2020, at 12:38 PM, Alexei Colin wrote:
>
> On Sat, Nov 14, 2020 at 08:07:47PM +0000, Ralph Castain via users wrote:
>> IIRC, the correct syntax is:
>>
>> prun -host +e ...
>>
>> This tells P
IIRC, the correct syntax is:
prun -host +e ...
This tells PRRTE that you want empty nodes for this application. You can even
specify how many empty nodes you want:
prun -host +e:2 ...
I haven't tested that in a bit, so please let us know if it works or not so we
can fix it if necessary.
As f
be
> expected. I just want to make sure that this was the case, and the error
> below wasn't a sign of another issue with the job.
>
> Prentice
>
> On 11/11/20 5:47 PM, Ralph Castain via users wrote:
>> Looks like it is coming from the Slurm PMIx plugin, not OM
Looks like it is coming from the Slurm PMIx plugin, not OMPI.
Artem - any ideas?
Ralph
> On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users
> wrote:
>
> One of my users recently reported a failed job that was using OpenMPI 4.0.4
> compiled with PGI 20.4. There two different errors repo
Afraid I would have no idea - all I could tell them is that there was a bug and
it has been fixed
On Nov 2, 2020, at 12:18 AM, Andrea Piacentini via users
mailto:users@lists.open-mpi.org> > wrote:
I installed version 4.0.5 and the problem appears to be fixed.
Can you please help us explaini
Could you please tell us what version of OMPI you are using?
On Oct 28, 2020, at 11:16 AM, Andrea Piacentini via users
mailto:users@lists.open-mpi.org> > wrote:
Good morning we need to launch a MPMD application with two fortran excutables
and one interpreted python (mpi4py) application.
I'm not sure where you are looking, but those params are indeed present in the
opal/mca/btl/tcp component:
/*
* Called by MCA framework to open the component, registers
* component parameters.
*/
static int mca_btl_tcp_component_register(void)
{
char* message;
/* register TCP compo
m a chemist and not a sysadmin (I miss
a lot a specialized sysadmin in our Department!).
Carlo
Il giorno gio 20 ago 2020 alle ore 18:45 Ralph Castain via users
mailto:users@lists.open-mpi.org> > ha scritto:
Your use-case sounds more like a workflow than an application - in which case,
yo
Your use-case sounds more like a workflow than an application - in which case,
you probably should be using PRRTE to execute it instead of "mpirun" as PRRTE
will "remember" the multiple jobs and avoid the overload scenario you describe.
This link will walk you thru how to get and build it:
http
., 12 ago. 2020 18:29, Ralph Castain via users mailto:users@lists.open-mpi.org> > escribió:
Setting aside the known issue with comm_spawn in v4.0.4, how are you planning
to forward stdin without the use of "mpirun"? Something has to collect stdin of
the terminal and distribute it to
Setting aside the known issue with comm_spawn in v4.0.4, how are you planning
to forward stdin without the use of "mpirun"? Something has to collect stdin of
the terminal and distribute it to the stdin of the processes
> On Aug 12, 2020, at 9:20 AM, Alvaro Payero Pinto via users
> wrote:
>
>
Howard - if there is a problem in PMIx that is causing this problem, then we
really could use a report on it ASAP as we are getting ready to release v3.1.6
and I doubt we have addressed anything relevant to what is being discussed here.
On Aug 11, 2020, at 4:35 PM, Martín Morales via users mai
My apologies - I should have included "--debug-daemons" for the mpirun cmd line
so that the stderr of the backend daemons would be output.
> On Aug 10, 2020, at 10:28 AM, John Duffy via users
> wrote:
>
> Thanks Ralph
>
> I will do all of that. Much appreciated.
Well, we aren't really that picky :-) While I agree with Gilles that we are
unlikely to be able to help you resolve the problem, we can give you a couple
of ideas on how to chase it down
First, be sure to build OMPI with "--enable-debug" and then try adding "--mca
oob_base_verbose 100" to you
The Java bindings were added specifically to support the Spark/Hadoop
communities, so I see no reason why you couldn't use them for Akka or whatever.
Note that there are also Python wrappers for MPI at mpi4py that you could build
upon.
There is plenty of evidence out there for a general migrati
Be default, OMPI will bind your procs to a single core. You probably want to at
least bind to socket (for NUMA reasons), or not bind at all if you want to use
all the cores on the node.
So either add "--bind-to socket" or "--bind-to none" to your cmd line.
On Aug 3, 2020, at 1:33 AM, John Duff
Add "--mca pml cm" to your cmd line
On Jul 31, 2020, at 9:54 PM, Supun Kamburugamuve via users
mailto:users@lists.open-mpi.org> > wrote:
Hi all,
I'm trying to setup OpenMPI on a cluster with the Omni-Path network. When I try
the following command it gives an error.
mpirun -n 2 --hostfile nod
While possible, it is highly unlikely that your desktop version is going to be
binary compatible with your cluster...
On Jul 24, 2020, at 9:55 AM, Lana Deere via users mailto:users@lists.open-mpi.org> > wrote:
I have open-mpi 4.0.4 installed on my desktop and my small test programs are
working.
You cannot cascade mpirun cmds like that - the child mpirun picks up envars
that causes it to break. You'd have to either use comm_spawn to start the child
job, or do a fork/exec where you can set the environment to be some pristine
set of values.
> On Jul 11, 2020, at 1:12 PM, John Retterer v
Note that you can also resolve it by adding --use-hwthread-cpus to your cmd
line - it instructs mpirun to treat the HWTs as independent cpus so you would
have 4 slots in this case.
> On Jun 8, 2020, at 11:28 AM, Collin Strassburger via users
> wrote:
>
> Hello David,
>
> The slot calculatio
Afraid I have no real ideas here. Best I can suggest is taking the qrsh cmd
line from the prior debug output and try running it manually. This might give
you a chance to manipulate it and see if you can identify what is causing it an
issue, if anything. Without mpirun executing, the daemons will
mpdir_base).
> Please check with your sys admin to determine the correct location to use.
>
> * compilation of the orted with dynamic libraries when static are required
> (e.g., on Cray). Please check your configure cmd line and consider using
> one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
> lack of comm
The messages about the daemons is coming from two different sources. Grid is
saying it was able to spawn the orted - then the orted is saying it doesn't
know how to communicate and fails.
I think the root of the problem lies in the plm output that shows the qrsh it
will use to start the job. Fo
Try adding --without-psm2 to the PMIx configure line - sounds like you have
that library installed on your machine, even though you don't have omnipath.
On May 12, 2020, at 4:42 AM, Leandro via users mailto:users@lists.open-mpi.org> > wrote:
HI,
I compile it statically to make sure compilers
I'm not sure I understand why you are trying to build CentOS rpms for PMIx,
Slurm, or OMPI - all three are readily available online. Is there some
particular reason you are trying to do this yourself? I ask because it is
non-trivial to do and requires significant familiarity with both the
intri
I fear those cards are past end-of-life so far as support is concerned. I'm not
sure if anyone can really advise you on them. It sounds like the fabric is
experiencing failures, but that's just a guess.
On May 8, 2020, at 12:56 PM, Prentice Bisbal via users
mailto:users@lists.open-mpi.org> > w
The following (from what you posted earlier):
$ srun --mpi=list
srun: MPI types are...
srun: none
srun: pmix_v3
srun: pmi2
srun: openmpi
srun: pmix
would indicate that Slurm was built against a PMIx v3.x release. Using OMPI
v4.0.3 with pmix=internal should be just fine so long as you set --mpi=p
PMIx:
$ srun --mpi=list
srun: MPI types are...
srun: none
srun: pmi2
srun: openmpi
I did launch the job with srun --mpi=pmi2
Does OpenMPI 4 need PMIx specifically?
On 4/23/20 10:23 AM, Ralph Castain via users wrote:
Is Slurm built with PMIx support? Did you tell srun to use it?
On Apr 23
ls. Why is that? Can I not trust the output
> of --mpi=list?
>
> Prentice
>
> On 4/23/20 10:43 AM, Ralph Castain via users wrote:
>> No, but you do have to explicitly build OMPI with non-PMIx support if that
>> is what you are going to use. In this case, you need to
--mpi=pmi2
>
> Does OpenMPI 4 need PMIx specifically?
>
>
> On 4/23/20 10:23 AM, Ralph Castain via users wrote:
>> Is Slurm built with PMIx support? Did you tell srun to use it?
>>
>>
>>> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
>>
Is Slurm built with PMIx support? Did you tell srun to use it?
> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
> wrote:
>
> I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a
> very simple hello, world MPI program that I've used reliably for years. When
> I
the difference between the working node flag (0x11) and the
non-working nodes’ flags (0x13) is the flagPRRTE_NODE_FLAG_LOC_VERIFIED.
What does that imply? The location of the daemon has NOT been verified?
Kurt
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph C
I updated the message to explain the flags (instead of a numerical value) for
OMPI v5. In brief:
#define PRRTE_NODE_FLAG_DAEMON_LAUNCHED 0x01 // whether or not the daemon
on this node has been launched
#define PRRTE_NODE_FLAG_LOC_VERIFIED 0x02 // whether or not the
location
mailto:moritz.kreut...@siemens.com>
www.sw.siemens.com <http://www.sw.siemens.com/>
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Montag, 6. April 2020 16:32
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Ralph Castain mail
Currently, mpirun takes that second SIGINT to mean "you seem to be stuck trying
to cleanly abort - just die", which means mpirun exits immediately without
doing any cleanup. The individual procs all commit suicide when they see their
daemons go away, which is why you don't get zombies left behin
I'm afraid the short answer is "no" - there is no way to do that today.
> On Mar 30, 2020, at 1:45 PM, Jean-Baptiste Skutnik via users
> wrote:
>
> Hello,
>
> I am writing a wrapper around `mpirun` which requires pre-processing of the
> user's program. To achieve this, I need to isolate the
Sorry for the incredibly late reply. Hopefully, you have already managed to
find the answer.
I'm not sure what your comm_spawn command looks like, but it appears you
specified the host in it using the "dash_host" info-key, yes? The problem is
that this is interpreted the same way as the "-host
FWIW: I have replaced those flags in the display option output with their
string equivalent to make interpretation easier. This is available in OMPI
master and will be included in the v5 release.
> On Nov 21, 2019, at 2:08 AM, Peter Kjellström via users
> wrote:
>
> On Mon, 18 Nov 2019 17:4
Hi Nathan
Sorry for the long, long delay in responding - no reasonable excuse (just busy,
switching over support areas, etc.). Hopefully, you already found the solution.
You can specify the signals to forward to children using an MCA parameter:
OMPI_MCA_ess_base_forward_signals=SIGINT
should d
It is also wise to create a "tmp" directory under your home directory, and
reset TMPDIR to point there. Avoiding use of the system tmpdir is highly
advisable under Mac OS, especially Catalina.
On Feb 6, 2020, at 4:09 PM, Gutierrez, Samuel K. via users
mailto:users@lists.open-mpi.org> > wrote:
1 - 100 of 127 matches
Mail list logo