Wow, that is hilarious. I see no problem with adding the extra characters :-)
Scheduled it for 1.8.2 (copied you on ticket)
On May 21, 2014, at 3:29 PM, W Spector wrote:
> Hi,
>
> When running under valgrind, I get warnings from each MPI process at MPI_Init
> time. The warnings come from fu
Note that the lama mapper described in those slides may not work as it hasn't
been maintained in a while. However, you can use the map-by and bind-to options
to do the same things.
If you want to disable binding, you can do so by adding "--bind-to none" to the
cmd line, or via the MCA param "hw
Hmmm...that is a bit of a problem. I've added a note to see if we can turn down
the aggressiveness of the MPI layer once we hit finalize, but that won't solve
your immediate problem.
Our usual suggestion is that you have each proc call finalize before going on
to do other things. This avoids th
environment, so I'm assuming there
is some other limitation in play here? If so, you could always put the MCA
param in the default mca param file - we'll pick it up from there.
>
> Albert
>
> On 23/05/14 14:32, Ralph Castain wrote:
>> Note that the lama mapper describ
_base_binding_policy = none
HTH
Ralph
>
> Thanks,
> Albert
>
> On 23/05/14 15:02, Ralph Castain wrote:
>>
>> On May 23, 2014, at 6:58 AM, Albert Solernou
>> wrote:
>>
>>> Hi,
>>> thanks a lot for your quick answers, and I see my error,
On May 23, 2014, at 7:21 AM, Iván Cores González wrote:
> Hi Ralph,
> Thanks for your response.
> I see your point, I try to change the algorithm but some processes finish
> while the others are still calling MPI functions. I can't avoid this
> behaviour.
> The ideal behavior is the processe
waiting until it receives some data
>> from other processes?
>
> This solution was my first idea, but I can't do it. I use spawned processes
> and
> different communicators for manage "groups" of processes, so the ideal
> behaviour
> is that processes finished and
em is that Finalize invokes a
barrier, and some of the procs aren't there any more to participate.
On May 23, 2014, at 12:03 PM, Ralph Castain wrote:
> I'll check to see - should be working
>
> On May 23, 2014, at 8:07 AM, Iván Cores González wrote:
>
>>> I as
is merely a band-aid ...
>>
>> More info @ https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/852760.
>> A better fix for this issue will be to add "-fno-builtin-strdup" to
>> your CFLAGS when compiling Open MPI.
>>
>> Ge
Strange - I note that you are running these as singletons. Can you try running
it under mpirun?
mpirun -n 1 ./a.out
just to see if it is the singleton that is causing the problem, or something in
the openib btl itself.
On May 26, 2014, at 6:59 AM, Alain Miniussi wrote:
>
> Hi,
>
> I have
noticed that process rank 0 with PID 5123 on node tagir exited on
> signal 13 (Broken pipe).
> --
> [alainm@tagir mpi]$
>
>
> do you want me to try a gcc build ?
>
> Alain
>
> On 26/05/2014 1
4/libnsl.so.1 (0x003beb00)
>libutil.so.1 => /lib64/libutil.so.1 (0x003bea00)
>libm.so.6 => /lib64/libm.so.6 (0x003bd9a0)
>/lib64/ld-linux-x86-64.so.2 (0x003bd8e0)
> [alainm@gurney mpi]$ ./a.out
> [alainm@gurney mpi]$
>
> So it seems to
I'm unaware of any OMPI error message like that - might be caused by something
in libevent as that could be using epoll, so it could be caused by us. However,
I'm a little concerned about the use of the prerelease version of Slurm as we
know that PMI is having some problems over there.
So out o
o the attention of the
folks who maintain that component and see if they can grok the problem.
Thanks!
Ralph
>
> Alain
>
> On 27/05/2014 17:30, Ralph Castain wrote:
>> Ah, good. On the setup that fails, could you use gdb to find the line number
>> where it is dividing
If it is a slurm PMI issue, this should resolve it.
On May 28, 2014, at 12:03 AM, Filippo Spiga wrote:
> Dear Ralph,
>
> On May 27, 2014, at 6:31 PM, Ralph Castain wrote:
>> So out of curiosity - how was this job launched? Via mpirun or directly
>> using srun?
>
Are you sure you have /Users/lorenzodona/Documents/openmpi-1.8.1/bin at the
*beginning* of your PATH?
Reason: most common cause of what you are showing is that you are picking up
some other version of mpif90
On May 29, 2014, at 4:11 AM, Lorenzo Donà wrote:
> I compiled openmpi 1.8.1 with int
Can you pass along the cmd line that generated that output, and how OMPI was
configured?
On May 30, 2014, at 5:11 AM, Тимур Исмагилов wrote:
> Hello!
>
> I am using Open MPI v1.8.1 and slurm 2.5.6.
>
> I got this messages when i try to run example (hello_oshmem.cpp) program:
>
> [warn] Epoll
ssi wrote:
> Please note that I had the problem with 13.1.0 but not with the 13.1.1
>
>
> On 28/05/2014 00:47, Ralph Castain wrote:
>> On May 27, 2014, at 3:32 PM, Alain Miniussi wrote:
>>
>>> Unfortunately, the debug library works like a charm (which make the
&
Sorry for delayed response - been a little hectic here.
I suspect the problem is that we really need a passwordless ssh connection in
order to preload the file for 1.6.5. This isn't required in the 1.8 series, so
you might want to try it with 1.8.1. Otherwise, resolve the password issue and
it
Sounds odd - can you configure OMPI --enable-debug and run it again? If it
fails and you can get a core dump, could you tell us the line number where it
is failing?
On Jun 3, 2014, at 9:58 AM, Fischer, Greg A. wrote:
> Apologies – I forgot to add some of the information requested by the FAQ:
---
> mpirun noticed that process rank 3 with PID 5845 on node 112 exited on
> signal 11 (Segmentation fault).
> --
>
> Does any of that help?
>
> Greg
>
> From: users
i-1.8.1/ompi/runtime/ompi_mpi_init.c:464
> #13 0x2b48f1760a37 in PMPI_Init (argc=0x2b48f6300020,
> argv=0x2b48f63000b8) at pinit.c:84
> #14 0x004024ef in main (argc=1, argv=0x7fffebf0d1f8) at ring_c.c:19
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph C
ing_c
>
> This may not fix the problem,
> but have you tried to add the shared memory btl to your mca parameter?
>
> mpirun -np 2 --mca btl openib,sm,self ring_c
>
> As far as I know, sm is the preferred transport layer for intra-node
> communication.
>
> Gus Correa
&
ot; on your cmd line
On Jun 4, 2014, at 9:58 AM, Ralph Castain wrote:
> He isn't getting that far - he's failing in MPI_Init when the RTE attempts to
> connect to the local daemon
>
>
> On Jun 4, 2014, at 9:53 AM, Gus Correa wrote:
>
>> Hi Greg
>>
>&
c:1645
> #10 0x2b82b16f8763 in orte_progress_thread_engine (obj=0x2b82b5300020) at
> ../../../../openmpi-1.8.1/orte/mca/ess/base/ess_base_std_app.c:456
> #11 0x2b82b0f1c7b6 in start_thread () from /lib64/libpthread.so.0
> #12 0x2b82b1410d6d in clone () from /lib64/libc.so.6
&g
users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, June 04, 2014 4:48 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c
>
> Urggg...unfortunately, the people who know the most about that code
rt
> --prefix=/mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.8.1_mxm-3.0
> --with-mxm=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/ --with-
> slurm --with-platform=contrib/platform/mellanox/optimized
>
>
> Fri, 30 May 2014 07:09:54 -0700 от Ralph Ca
I know Nathan has it running on the XC30, but I don't see a platform file
specifically for it in the repo. Did you try the cray_xe6 platform files - I
think he may have just augmented those to handle the XC30 case
Look in contrib/platform/lanl/cray_xe6
On Jun 5, 2014, at 9:00 AM, Hammond, Simo
On Jun 5, 2014, at 2:13 PM, Dan Dietz wrote:
> Hello all,
>
> I'd like to bind 8 cores to a single MPI rank for hybrid MPI/OpenMP
> codes. In OMPI 1.6.3, I can do:
>
> $ mpirun -np 2 -cpus-per-rank 8 -machinefile ./nodes ./hello
>
> I get one rank bound to procs 0-7 and the other bound to 8-
Hmmm...I'm not sure how that is going to run with only one proc (I don't know
if the program is protected against that scenario). If you run with -np 2 -mca
btl openib,sm,self, is it happy?
On Jun 5, 2014, at 2:16 PM, Fischer, Greg A. wrote:
> Here’s the command I’m invoking and the terminal
due to the above mentioned problem?
>
>
> Thu, 5 Jun 2014 07:45:01 -0700 от Ralph Castain :
> FWIW: support for the --resv-ports option was deprecated and removed on the
> OMPI side a long time ago.
>
> I'm not familiar enough with "oshrun" to know if it is do
nf316:21583] [16] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f3b58301c36]
> [binf316:21583] [17] ring_c[0x400889]
> [binf316:21583] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 21583 on node 316 exited on
> signal 6 (Abo
We currently only get the node and slots/node info from PBS - we don't get any
task placement info at all. We then use the mpirun cmd options and built-in
mappers to map the tasks to the nodes.
I suppose we could do more integration in that regard, but haven't really seen
a reason to do so - th
Process 0 exiting
> Process 1 exiting
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 10:34 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
> Huh - how strange. I can't imag
irst
>> 16 being host node0001 and the last 8 being node0002), it appears that 24
>> MPI tasks try to start on node0001 instead of getting distributed as 16 on
>> node0001 and 8 on node0002. Hence, I am curious what is being passed by
>> PBS.
>>
>> --john
&g
On Jun 6, 2014, at 10:24 AM, Gus Correa wrote:
> On 06/06/2014 01:05 PM, Ralph Castain wrote:
>> You can always add --display-allocation to the cmd line to see what we
>> thought we received.
>>
>> If you configure OMPI with --enable-debug, you can set --mca
>&g
04048f4 in main (argc=6, argv=0x7fff5bb2a3a8) at main.c:13
>
> ddietz@conte-a009:/scratch/conte/d/ddietz/hello$ cat nodes
> conte-a009
> conte-a009
> conte-a055
> conte-a055
> ddietz@conte-a009:/scratch/conte/d/ddietz/hello$ uname -r
> 2.6.32-358.14.1.el6.x86_64
>
> On
cheduler is passing along the correct slot count #s (16 and 8, resp).
>
> Am I running into a bug w/ OpenMPI 1.6?
>
> --john
>
>
>
> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 1:30
Supposed to, yes - but I don't know how much testing it has seen. I can try to
take a look
On Jun 6, 2014, at 12:02 PM, E.O. wrote:
> Hello
> I am using OpenMPI ver 1.8.1 on a cluster of 4 machines.
> One Redhat 6.2 and three busybox machine. They are all 64bit environment.
>
> I want to use -
You might want to update to 1.6.5, if you can - I'll see what I can find
On Jun 6, 2014, at 12:07 PM, Sasso, John (GE Power & Water, Non-GE)
wrote:
> Version 1.6 (i.e. prior to 1.6.1)
>
> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Be
t; [conte-a009:55685] Failing at address: 0x4c
> [conte-a009:55685] [ 0] /lib64/libpthread.so.0[0x327f80f500]
> [conte-a009:55685] [ 1]
> /scratch/conte/d/ddietz/openmpi-1.8.1-debug/intel-14.0.2.144/lib/libopen-rte.so.7(orte_plm_base_complete_setup+0x951)[0x2b5b069a50e1]
> [conte-a009:55685
Yeah, it doesn't require ssh any more - but I haven't tested it in a bit, and
so it's possible something crept in there.
On Jun 6, 2014, at 12:27 PM, Reuti wrote:
> Am 06.06.2014 um 21:04 schrieb Ralph Castain:
>
>> Supposed to, yes - but I don't know how muc
Okay, I found the problem and think I have a fix that I posted (copied EO on
it). You are welcome to download the patch and try it. Scheduled for release in
1.8.2
Thanks
Ralph
On Jun 6, 2014, at 1:01 PM, Ralph Castain wrote:
> Yeah, it doesn't require ssh any more - but I haven
y to each node beforehand of course). I guess this is a different
> issue?
>
> Eiichi
>
>
> eiichi
>
>
>
> On Fri, Jun 6, 2014 at 5:35 PM, Ralph Castain wrote:
> Okay, I found the problem and think I have a fix that I posted (copied EO on
> it). You are w
Looks like there is some strange interaction there, but I doubt I'll get
around to fixing it soon unless someone has a burning reason to not use tree
spawn when preloading binaries. I'll mark it down as something to look at as
time permits.
On Jun 6, 2014, at 4:28 PM, Ralph Cast
0],1]
>> [conte-a009.rcac.purdue.edu:55685] [[24164,0],0]
>> plm:base:orted_report_launch from daemon [[24164,0],1] on node
>> conte-a055
>> [conte-a009.rcac.purdue.edu:55685] [[24164,0],0] RECEIVED TOPOLOGY
>> FROM NODE conte-a055
>> [conte-a009.rcac.purdue.edu:55685] [[24164
f possible, thus providing
better performance.
Scheduled this for 1.8.2, asking Tetsuya to review.
On Jun 6, 2014, at 6:25 PM, Ralph Castain wrote:
> HmmmTetsuya is quite correct. Afraid I got distracted by the segfault
> (still investigating that one). Our default policy for 2 proces
1]
> /scratch/conte/d/ddietz/openmpi-1.8.1-debug/intel-14.0.2.144/lib/libopen-rte.so.7(orte_plm_base_complete_setup+0x951)[0x2b5b069a50e1]
> [conte-a009:55685] [ 2]
> /scratch/conte/d/ddietz/openmpi-1.8.1-debug/intel-14.0.2.144/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0xa05)[0x2b
Sorry about the comment re cpus-per-proc - confused this momentarily with
another user also using Torque. I confirmed that this works fine with 1.6.5,
and would guess you are hitting some bug in 1.6.0. Can you update?
On Jun 6, 2014, at 12:20 PM, Ralph Castain wrote:
> You might want
On Jun 9, 2014, at 2:41 PM, Vineet Rawat wrote:
> Hi,
>
> We've deployed OpenMPI on a small cluster but get a SEGV in orted. Debug
> information is very limited as the cluster is at a remote customer site. They
> have a network card with which I'm not familiar (Cisco Systems Inc VIC P81E
> P
There is one new "feature" in 1.8 - it now checks to see if the version on the
backend matches the version on the frontend. In other words, mpirun checks to
see if the orted connecting to it is from the same version - if not, the orted
will die.
Shouldn't segfault, though - just abort.
You cou
read change
> was %d; write change was %d]",
>
>
>
> On Fri, Jun 6, 2014 at 3:38 PM, Ralph Castain wrote:
> Possible - honestly don't know
>
> On Jun 6, 2014, at 12:16 AM, Timur Ismagilov wrote:
>
>> Sometimes, after termination of the
te-a084:51113] [ 6] mpirun[0x404719]
> [conte-a084:51113] *** End of error message ***
> Segmentation fault (core dumped)
>
> On Sun, Jun 8, 2014 at 4:54 PM, Ralph Castain wrote:
>> I'm having no luck poking at this segfault issue. For some strange reason,
>> we s
44 AM, Dan Dietz wrote:
> Sorry - it crashes with both torque and rsh launchers. The output from
> a gdb backtrace on the core files looks identical.
>
> Dan
>
> On Wed, Jun 11, 2014 at 9:37 AM, Ralph Castain wrote:
>> Afraid I'm a little confused now - are you say
Yeah, I think we've seen that somewhere before too...
On Jun 11, 2014, at 2:59 PM, Joshua Ladd wrote:
> Agreed. The problem is not with UDCM. I don't think something is wrong with
> the system. I think his Torque is imposing major constraints on the maximum
> size that can be locked into memo
it solves the problem?
On Jun 11, 2014, at 2:15 PM, Ralph Castain wrote:
> Okay, let me poke around some more. It is clearly tied to the coprocessors,
> but I'm not yet sure just why.
>
> One thing you might do is try the nightly 1.8.2 tarball - there have been a
> numb
for release isn't far off. Maybe not end June as last-minute things always
surface in the RC process, but likely early July.
On Jun 12, 2014, at 8:11 AM, Bennet Fauber wrote:
> On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote:
>> I've poked and prodded, and the 1.8.2 tar
rs to be crashing in a similar
> fashion. :-( I used the latest snapshot 1.8.2a1r31981.
>
> Dan
>
> On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote:
>> I've poked and prodded, and the 1.8.2 tarball seems to be handling this
>> situation just fine. I don't h
Kewl - thanks! I'm a Purdue alum, if that helps :-)
On Jun 12, 2014, at 9:04 AM, Dan Dietz wrote:
> That shouldn't be a problem. Let me figure out the process and I'll
> get back to you.
>
> Dan
>
> On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote:
>
wrote:
> That shouldn't be a problem. Let me figure out the process and I'll
> get back to you.
>
> Dan
>
> On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote:
>> Arggh - is there any way I can get access to this beast so I can debug this?
>> I can't
Well, for one, there is never any guarantee of linear scaling with the number
of procs - that is very application dependent. You can actually see performance
decrease with number of procs if the application doesn't know how to exploit
them.
One thing that stands out is your mapping and binding
Well, yes and no. Besides the real question would be if this app, which this
person didn't write, was written as a threaded application.
On Jun 16, 2014, at 8:32 PM, Zehan Cui wrote:
> Hi Yuping,
>
> Maybe using multi-threads inside a socket, and MPI among sockets is better
> choice for such
e_timestep_loop --animation_freq -1
>
> I run above command, still do not improve. Would you give me a detailed
> command with options?
> Thank you.
>
> Best regards,
>
> Yuping
>
>
> --------
> On Tue, 6/17/14, Ralph Castain w
The default binding option depends on the number of procs - it is bind-to core
for np=2, and bind-to socket for np > 2. You never said, but should I assume
you ran 4 ranks? If so, then we should be trying to bind-to socket.
I'm not sure what your cpuset is telling us - are you binding us to a so
on this node.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote:
>
>> The default binding option depends on the number of procs -
e
> ether/or depending on the job.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote:
>
>> Sorry, I should have bee
all
>> available processors)
>> [nyx5775.engin.umich.edu:27951] MCW rank 16 is not bound (or bound to all
>> available processors)
>> [nyx5597.engin.umich.edu:68073] MCW rank 29 is not bound (or bound to all
>> available processors)
>> [nyx5597.engin.umich.edu:
ig file. How can I make
> this the default, that then a user can override?
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote:
>
What was updated? If the OS, did you remember to set the memory registration
limits to max?
On Jun 20, 2014, at 11:25 AM, Ivanov, Aleksandar (INR)
wrote:
>
> Dear Sir or Madam,
>
> I am using the openmpi 1.6.5 library compiled with IFORT / ICC 13.1.5. Since
> a recent update of our machi
ssonnea...@calculquebec.ca> wrote:
> Hi,
> I've been following this thread because it may be relevant to our setup.
>
> Is there a drawback of having orte_hetero_nodes=1 as default MCA parameter
> ? Is there a reason why the most generic case is not assumed ?
>
> Maxim
ger. And yes we don't have the PHI stuff
> installed on all nodes, strange that 'all all' is now very short,
> ompi_info -a still works though.
>
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
I'm on the road today, but will be back tomorrow afternoon (US Pacific
time) and can forward my notes on this again. In the interim, just go to
our user mailing list archives and search for "phi" and you'll see the
conversations. Basically, you have to cross-compile OMPI to run on the Phi.
I've be
You should add this to your cmd line:
--map-by core:pe=4
This will bind each process to 4 cores
Sent from my iPhone
> On Jun 27, 2014, at 5:22 AM, Luigi Santangelo
> wrote:
>
> Hi all,
> My system is a 64 core, with Debian 3.2.57 64 bit, GNU gcc 4.7, kernel Linux
> 3.2.0 and OpenMPI 1.8.1.
Let me steer you on a different course. Can you run "ompi_info" and paste the
output here? It looks to me like someone installed a version that includes
uDAPL support, so you may have to disable some additional things to get it to
run.
On Jun 27, 2014, at 9:53 AM, Jeffrey A Cummings
wrote:
I don't recall ever seeing such an option in Open MPI - what makes you believe
it should exist?
On Jun 29, 2014, at 9:25 PM, Đỗ Mai Anh Tú wrote:
> Hi all,
>
> I am trying to run the checkpoint/restart enabled debugging code in Open
> MPI. This requires configure this option at the set up step
0, API v2.0, Component v1.6.2)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.2)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.2)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.2)
> MCA notifier: command (MCA v2.0,
> jeffrey.a.cummi...@aero.org
>
>
>
> From:Ralph Castain
> To:Open MPI Users ,
> Date:06/30/2014 02:13 PM
> Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6
> Sent by:"users"
>
>
>
> Yeah, this
OMPI started binding by default during the 1.7 series. You should add the
following to your cmd line:
--map-by :pe=$OMP_NUM_THREADS
This will give you a dedicated core for each thread. Alternatively, you could
instead add
--bind-to socket
OMPI 1.5.5 doesn't bind at all unless directed to do s
I would suggest having him look at the core file with a debugger and see where
it fails. Sounds like he has a memory corruption problem.
On Jun 24, 2014, at 3:31 AM, Dr.Peer-Joachim Koch wrote:
> Hi,
>
> one of our cluster users reported a problem with openmpi.
> He created a short sample (ju
d 0.021 sec
> Do i have utility similar to the 'top' with sbatch?
>
> Also, every time, i got the message in ompi 1.9:
> mca: base: components_register: component sbgp / ibnet register function
> failed
> Is it bad?
>
> Regards,
> Timur
>
> Wed, 2 J
f the available mappers was able to perform the requested
> mapping operation. This can happen if you request a map type
> (e.g., loadbalance) and the corresponding mapper was not built.
> ...
>
>
>
> Wed, 2 Jul 2014 07:36:48 -0700 от Ralph Castain :
> Let's keep this on
Unfortunately, that has never been supported. The problem is that the embedded
mpirun picks up all those MCA params that were provided to the original
application process, and gets hopelessly confused. We have tried in the past to
figure out a solution, but it has proved difficult to separate th
component sbgp / ibnet
> register function failed
> Main 21.366504 secs total /1
> Computation 21.048671 secs total /1000
> [node1-128-29:21569] mca: base: close: unloading component lama
> [node1-128-29:21569] mca: base: close: component mindist closed
> [node1-128-29:21569] mca: bas
'{print $2" slots="$1}' > $HOSTFILE || { rm
> -f $HOSTFILE; exit 255; }
> LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so
> mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --map-by slot:pe=8 --mca
> rmaps_base_verbose 20 --hostfile $HOS
With that little info, no - you haven't told us anything. How are you running
this "rank test", how was OMPI configured, etc?
On Jul 7, 2014, at 6:21 AM, Alexander Frolov wrote:
> Hi!
>
> I am running MPI rank test using srun and all processes think that they are
> rank 0.
>
> * slurm 14.11
olo/local/openmpi-1.8.1-gcc-4.8.2 --with-openib
> --enable-mpi-thread-multiple CC=/local/usr/local/bin/gcc
> CXX=/local/usr/local/bin/g++
> ---
> slurm configured as follows:
>
> ./configure --prefix=/home/frolo/local/slurm
>
> (I'm running it as a user)
> ---
>
During the 1.7 series and for all follow-on series, OMPI changed to a mode
where it launches a daemon on all allocated nodes at the startup of mpirun.
This allows us to determine the hardware topology of the nodes and take that
into account when mapping. You can override that behavior by either
gt;
> and while
>
> /opt/openmpi/bin/mpirun -host host18 ompi_info
>
> works
>
> /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -host host18 ompi_info
>
> hangs, is there some condition in the use of this parameter.
>
> Yours truly
>
> Ricardo
Hmmm...no, it worked just fine for me. It sounds like something else is going
on.
Try configuring OMPI with --enable-debug, and then add -mca plm_base_verbose 10
to get a better sense of what is going on.
On Jul 14, 2014, at 10:27 AM, Ralph Castain wrote:
> I confess I haven'
t; [nexus10.nlroc:27438] mca:base:select:( plm) Skipping component [slurm].
> Query failed to return a module
> [nexus10.nlroc:27438] mca:base:select:( plm) Selected component [rsh]
> [nexus10.nlroc:27438] mca: base: close: component isolated closed
> [nexus10.nlroc:27438] mca: base:
Forgive me, but I am now fully confused - case 1 and case 3 appear identical to
me, except for the debug-daemons flag on case 3.
On Jul 15, 2014, at 7:56 AM, Ricardo Fernández-Perea
wrote:
> What I mean with "another mpi process".
> I have 4 nodes where there is process that use mpi and whe
Given that a number of Windows components and #if protections were removed in
the 1.7/1.8 series, I very much doubt this will build or work. Are you
intending to try and recreate that code?
Otherwise, there is a port to cygwin available from that community.
On Jul 16, 2014, at 8:52 AM, MM wrot
FWIW: now that I have access to the Intel compiler suite, including for
Windows, I've been toying with creating a more stable support solution for OMPI
on Windows. It's a low-priority task for me because it isn't clear that we have
very many Windows users in HPC land, and the cygwin port already
Guess we just haven't seen that much activity on the OMPI list, even when we
did support Windows. I'll see what I can do - I think this new method will
allow it to be stable and require far less maintenance than what we had before.
On Jul 17, 2014, at 10:42 AM, Jed Brown wrote:
> Rob Latham
Yeah, but I'm cheap and get the Intel compilers for free :-)
On Jul 17, 2014, at 11:38 AM, Rob Latham wrote:
>
>
> On 07/17/2014 01:19 PM, Jed Brown wrote:
>> Damien writes:
>>
>>> Is this something that could be funded by Microsoft, and is it time to
>>> approach them perhaps? MS MPI is b
this stage.
On Jul 17, 2014, at 11:52 AM, Jed Brown wrote:
> Ralph Castain writes:
>
>> Yeah, but I'm cheap and get the Intel compilers for free :-)
>
> Fine for you, but not for the people trying to integrate your library
> in a stack developed using MSVC.
I'm not exactly sure how to fix what you described. The semicolon is escaped
because otherwise the cmd line would think it had been separated - the orted
cmd line is ssh'd to the remote node and cannot include an unescaped
terminator. The space isn't a "special character" in that sense, and actu
That's a pretty old OMPI version, and we don't really support it any longer.
However, I can provide some advice:
* have you tried running the simple "hello_c" example we provide? This would at
least tell you if the problem is in your app, which is what I'd expect given
your description
* try u
er
> values
> be necessary?
>
> Thank you for your help.
>
> -Bill Lane
>
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain
> [r...@open-mpi.org]
> Sent: Saturday, July 19, 2014 8:07 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Mpir
On Jul 20, 2014, at 7:11 AM, Diego Avesani wrote:
> Dear all,
> I have a question about mpi_finalize.
>
> After mpi_finalize the program returs single core, Have I understand
> correctly?
No - we don't kill any processes. We just tear down the MPI system. All your
processes continue to execu
101 - 200 of 3066 matches
Mail list logo