What kind of system was this on? ssh, slurm, ...?
> On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L wrote:
>
> I am running cases that are starting just fine and running for a few hours,
> then they die with a message that seems like a startup type of failure.
>
This is all on one node, yes?
Try adding the following:
-mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5
Lot of garbage, but should tell us what is going on.
On Aug 18, 2014, at 9:36 AM, Maxime Boissonneault
wrote:
> Here it
of
> mpirun -np 4 --mca plm_base_verbose 10 -mca odls_base_verbose 5 -mca
> state_base_verbose 5 -mca errmgr_base_verbose 5 ring_c |& tee
> output_ringc_verbose.txt
>
>
> Maxime
>
> Le 2014-08-18 12:48, Ralph Castain a écrit :
>> This is all on one node,
, Maxime Boissonneault
<maxime.boissonnea...@calculquebec.ca> wrote:
> Indeed, that makes sense now.
>
> Why isn't OpenMPI attempting to connect with the local loop for same node ?
> This used to work with 1.6.5.
>
> Maxime
>
> Le 2014-08-18 13:11, Ralph Castain a é
oes see ib0 (despite not seeing
> lo), but does not attempt to use it.
>
>
> On the compute nodes, we have eth0 (management), ib0 and lo, and it works. I
> am unsure why it does work on the compute nodes and not on the login nodes.
> The only difference is the presence of a public interf
Just to clarify: OMPI will bind the process to *all* N cores, not just to one.
On Aug 20, 2014, at 4:26 AM, tmish...@jcity.maeda.co.jp wrote:
> Reuti,
>
> If you want to allocate 10 procs with N threads, the Torque
> script below should work for you:
>
> qsub -l nodes=10:ppn=N
> mpirun
On Aug 20, 2014, at 6:58 AM, Reuti wrote:
> Hi,
>
> Am 20.08.2014 um 13:26 schrieb tmish...@jcity.maeda.co.jp:
>
>> Reuti,
>>
>> If you want to allocate 10 procs with N threads, the Torque
>> script below should work for you:
>>
>> qsub -l nodes=10:ppn=N
>>
On Aug 20, 2014, at 9:04 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 20.08.2014 um 16:26 schrieb Ralph Castain:
>
>> On Aug 20, 2014, at 6:58 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>
>>> Hi,
>>>
>>> Am 20.08.2014 um
It was not yet fixed - but should be now.
On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote:
> Hello!
>
> As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still have
> the problem
>
> a)
> $ mpirun -np 1 ./hello_c
>
>
yes, i know - it is cmr'd
On Aug 20, 2014, at 10:26 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
> btw, we get same error in v1.8 branch as well.
>
>
> On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain <r...@open-mpi.org> wrote:
> It was not yet fixed - but sh
On Aug 20, 2014, at 11:16 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 20.08.2014 um 19:05 schrieb Ralph Castain:
>
>>>
>>> Aha, this is quite interesting - how do you do this: scanning the
>>> /proc//status or alike? What happens i
Or you can add
-nolocal|--nolocalDo not run any MPI applications on the local node
to your mpirun command line and we won't run any application procs on the node
where mpirun is executing
On Aug 20, 2014, at 4:28 PM, Joshua Ladd wrote:
> Hi, Filippo
>
> When
>
> Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain <r...@open-mpi.org>:
> yes, i know - it is cmr'd
>
> On Aug 20, 2014, at 10:26 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
>
>> btw, we get same error in v1.8 branch as well.
>>
>>
>>
l burden. Of course
> login node and front-end node can be two separated hosts but I am looking for
> a way to keep our setup as-it-is without introducing structural changes.
>
>
> Hi Ralph,
>
> On Aug 21, 2014, at 12:36 AM, Ralph Castain <r...@open-mpi.org> wrote:
On Aug 21, 2014, at 2:51 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 20.08.2014 um 23:16 schrieb Ralph Castain:
>
>>
>> On Aug 20, 2014, at 11:16 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>
>>> Am 20.08.2014 um 19:05 schrieb Ralph
On Aug 21, 2014, at 6:54 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 21.08.2014 um 15:45 schrieb Ralph Castain:
>
>> On Aug 21, 2014, at 2:51 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>
>>> Am 20.08.2014 um 23:16 schrieb Ralph Castain:
>&
Should not be required (unless they are statically built) as we do strive to
maintain ABI within a series
On Aug 21, 2014, at 9:39 AM, Maxime Boissonneault
wrote:
> Hi,
> Would you say that softwares compiled using OpenMPI 1.8.1 need to be
> recompiled
On Aug 21, 2014, at 10:58 AM, Filippo Spiga <spiga.fili...@gmail.com> wrote:
> Dear Ralph
>
> On Aug 21, 2014, at 2:30 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> I'm afraid that none of the mapping or binding options would be available
>> under srun as th
FWIW: I just tried on my Mac with the Intel 14.0 compilers, and it configured
and built just fine. However, that was with the current state of the 1.8 branch
(the upcoming 1.8.2 release), so you might want to try that in case there is a
difference.
On Aug 21, 2014, at 12:59 PM, Gus Correa
MPI
> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug
> 21, 2014 (nightly snapshot tarball), 146)
>
>
> Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain <r...@open-mpi.org>:
> Not sure I understand. The problem has been fixed in both the trunk and
Looks like there is something wrong with your gfortran install:
*** Fortran compiler
checking for gfortran... gfortran
checking whether we are using the GNU Fortran compiler... yes
checking whether gfortran accepts -g... yes
checking whether ln -s works... yes
checking if Fortran compiler
Add --enable-debug to your configure, and then re-run the --host test and add
"--leave-session-attached -mca plm_base_verbose 5 -ma oob_base_verbose 5" and
let's see what's going on
On Aug 26, 2014, at 7:31 AM, Benjamin Giehle wrote:
> Hello,
>
> i have a problem with
all), 146)
>
> real 1m3.932s
> user 0m0.035s
> sys 0m0.072s
>
>
>
>
> Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain <r...@open-mpi.org>:
> hmmmwhat is your allocation like? do you have a large hostfile, for
> example?
>
> if you add a --hos
lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0
> TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:1327509
On Aug 28, 2014, at 11:50 AM, McGrattan, Kevin B. Dr.
wrote:
> My institute recently purchased a linux cluster with 20 nodes; 2 sockets per
> node; 6 cores per socket. OpenMPI v 1.8.1 is installed. I want to run 15
> jobs. Each job requires 16 MPI processes. For
I'm unaware of any changes to the Slurm integration between rc4 and final
release. It sounds like this might be something else going on - try adding
"--leave-session-attached --debug-daemons" to your 1.8.2 command line and let's
see if any errors get reported.
On Aug 28, 2014, at 12:20 PM,
No, it isn't - but we aren't really maintaining the 1.6 series any more. You
might try updating to 1.6.5 and see if it remains there
On Aug 29, 2014, at 9:12 AM, Maxime Boissonneault
wrote:
> It looks like
> -npersocket 1
>
> cannot be used alone. If I
ED AT 2014-08-29T07:16:20 WITH
> SIGNAL 9 ***
> srun.slurm: error: borg01x144: task 1: Exited with exit code 213
> slurmd[borg01x144]: *** STEP 2332583.4 KILLED AT 2014-08-29T07:16:20 WITH
> SIGNAL 9 ***
> slurmd[borg01x144]: *** STEP 2332583.4 KILLED AT 2014-08-29T07:16:20 WITH
> S
wrote:
> It is still there in 1.6.5 (we also have it).
>
> I am just wondering if there is something wrong in our installation that
> makes MPI unabled to detect that there are two sockets per node if we do not
> include a npernode directive.
>
> Maxime
>
> Le 2014-0
un a maximum of 6 mpirun's at a time across a given set of
nodes. So you'd need to stage your allocations correctly to make it work.
>
> ------
>
> Date: Thu, 28 Aug 2014 13:27:12 -0700
> From: Ralph Castain <r...@open-mpi.org>
> To: Open MPI
B/B/B/B/B]
>
> [burn008:07256] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core
> 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core
> 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] and so on.
>
>
>
> *From:* users [mailto:
hwloc requires the numactl-devel package in addition to the numactl one
If I understand the email thread correctly, it sounds like you have at least
some nodes in your system that have fewer cores than others - is that correct?
>> Here are the definitions of the two parallel environments tested
closed[borg01w063:03815] mca: base: close: unloading component tcp
On Fri, Aug 29, 2014 at 3:18 PM, Ralph Castain <r...@open-mpi.org> wrote:
Rats - I also need "-mca plm_base_verbose 5" on there so I can see the cmd line being executed. Can you add it?
On Aug 29, 2014, at 11:16 A
You need to disconnect the parent/child from each other prior to finalizing -
see the attached example
simple_spawn.c
Description: Binary data
On Aug 31, 2014, at 9:44 PM, Roy wrote:
> Hi all,
>
> I'm using MPI_Comm_spawn to start new child process.
> I found that
call
> MPI_Finalize. To be more precise, the disconnect has a single role to
> redivide the application in separated groups of connected processes in order
> to prevent error propagation (such as MPI_Abort).
>
> George.
>
>
>
> On Mon, Sep 1, 2014 at 12:58 AM
ess6 of8 is on borg01w218
>
> I'll ask the admin to apply the patch locally...and wait for 1.8.3, I suppose.
>
> Thanks,
> Matt
>
> On Sun, Aug 31, 2014 at 10:08 AM, Ralph Castain <r...@open-mpi.org> wrote:
> HmmmI may see the problem. Would you be so kind
Would you please try r32662? I believe I finally found and fixed this problem.
On Sep 2, 2014, at 6:12 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.9a1r32657 on my machines (Solaris
> 10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and
I believe this was fixed in the trunk and is now scheduled to come across to
1.8.3
On Sep 2, 2014, at 4:21 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
> (tyr), Solaris 10 x86_64 (sunpc0), and
Hi Siegmar
Could you please configure this OMPI install with --enable-debug so that gdb
will provide line numbers where the error is occurring? Otherwise, I'm having a
hard time chasing this problem down.
Thanks
Ralph
On Sep 2, 2014, at 6:01 AM, Siegmar Gross
I don't see any line numbers on the errors I flagged - all I see are the usual
memory offsets in bytes, which is of little help. I'm afraid I don't what what
you'd have to do under SunOS to get line numbers, but I can't do much without it
On Sep 2, 2014, at 10:26 AM, Siegmar Gross
__________
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain
> [r...@open-mpi.org]
> Sent: Saturday, August 30, 2014 7:15 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots
> (u
The difficulty here is that you have bundled several errors again into a single
message, making it hard to keep the conversation from getting terribly
confused. I was trying to address the segfault errors on cleanup, which have
nothing to do with the accept being rejected.
It looks like those
ce 8/28/14) all occurred after we upgraded our cluster
> to OpenMPI 1.8.2 on . Maybe I should've created a new thread rather
> than tacking on these issues to my existing thread.
>
> -Bill Lane
>
>
> From: users [users-boun...@open-mpi.o
; Thanks.
>
>
>
> On Sep 2, 2014, at 9:36 AM, Matt Thompson <fort...@gmail.com> wrote:
>
>> On that machine, it would be SLES 11 SP1. I think it's soon transitioning to
>> SLES 11 SP3.
>>
>> I also use Open MPI on an RHEL 6.5 box (possibly soon to be RH
Thanks Matt - that does indeed resolve the "how" question :-)
We'll talk internally about how best to resolve the issue. We could, of course,
add a flag to indicate "we are using a shellscript version of srun" so we know
to quote things, but it would mean another thing that the user would have
Exiting with a non-zero status is considered as indicating a failure that needs
reporting.
On Sep 3, 2014, at 1:48 PM, Nico Schlömer wrote:
> Hi all,
>
> with OpenMPI 1.6.5 (default Debian/Ubuntu), I'm running the program
>
> ```
> #include
> #include
>
> int
Just to help separate out the issues, you might try running the hello_c program
in the OMPI examples directory - this will verify whether the problem is in the
mpirun command or in your program
On Sep 4, 2014, at 6:26 AM, Donato Pera wrote:
> Hi,
>
> the text was
Still begs the bigger question, though, as others have used script wrappers
before - and I'm not sure we (OMPI) want to be in the business of dictating the
scripting language they can use. :-)
Jeff and I will argue that one out
On Sep 4, 2014, at 7:38 AM, Jeff Squyres (jsquyres)
On Sep 5, 2014, at 3:34 PM, Allin Cottrell wrote:
> I suspect there is a new (to openmpi 1.8.N?) warning with respect to
> requesting a number of MPI processes greater than the number of "real" cores
> on a given machine. I can provide a good deal more information is that's
On Sep 5, 2014, at 10:44 AM, McGrattan, Kevin B. Dr.
wrote:
> I am testing a new cluster that we just bought, which is why I am loading
> things this way. I am deliberately increasing network traffic. But in
> general, we submit jobs intermittently with various
On Sep 6, 2014, at 7:52 AM, Allin Cottrell <cottr...@wfu.edu> wrote:
> On Fri, 5 Sep 2014, Ralph Castain wrote:
>
>> On Sep 5, 2014, at 3:34 PM, Allin Cottrell <cottr...@wfu.edu> wrote:
>>
>>> I suspect there is a new (to openmpi 1.8.N?) warning with res
On Sep 6, 2014, at 11:00 AM, Allin Cottrell <cottr...@wfu.edu> wrote:
> On Sat, 6 Sep 2014, Ralph Castain wrote:
>
>> On Sep 6, 2014, at 7:52 AM, Allin Cottrell <cottr...@wfu.edu> wrote:
>>
>>> On Fri, 5 Sep 2014, Ralph Castain wrote:
>>>
>&g
If you assign unique IP addresses to each container, you can then create a
hostfile that contains the IP addresses. Feed that to mpirun and it will work
just fine.
If you really want to do it under slurm, then slurm is going to need the list
of those IP addresses anyway. We read the slurm
What OMPI version?
On Sep 10, 2014, at 1:53 AM, Red Red wrote:
> Hi,
>
>
> after the installation of a Torque PBS when I start a simple program with
> mpirun I get this result (i have already installed again):
>
> [oxygen1:04280] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
Working on the memory alignment issues in the trunk, and they are being
scheduled to come across as we go.
On Sep 10, 2014, at 9:08 AM, Siegmar Gross
wrote:
> Hi,
>
> today I installed openmpi-1.8.3a1r32692 on my machines (Solaris
> 10 Sparc (tyr),
Hmmm...well, the info is there. There is an envar OMPI_COMM_WORLD_LOCAL_SIZE
which tells you how many procs are on this node. If you tell your proc how many
cores (or hwthreads) to use, it would be a simple division to get what you want.
You could also detect the number of cores or hwthreads
registry.hub.docker.com/search?q=openmpi
>
> I haven’t tested it to see whats in there, but the guys on the ompi mailing
> list might want to check that out.
>
> -Ben
>
>
On Sep 9, 2014, at 3:54 PM, Ralph Castain <r...@open-mpi.org> wrote:
> If you assign un
Not really "removed" - say rather "renamed". The PLPA system was replaced by
HWLOC starting with the 1.7 series. The binding directives were replaced with
--bind-to options as they became much more fine-grained than before - you
can bind all the way do the hardware thread level.
If you don't
h private to hosts and
> dynamic. But it's a start.
>
> Thank you
>
>
>
> On Wed, Sep 10, 2014 at 12:54 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> If you assign unique IP addresses to each container, you can then create a
>> hostfile that contains the IP
You should check that your path would also hit /usr/bin/mpiexec and not some
other version of it
On Sep 17, 2014, at 4:01 PM, Nico Schlömer wrote:
> Hi all!
>
> Today, I observed a really funky behavior of my stock
> ```
> $ mpiexec --version
> mpiexec (OpenRTE)
uld also hit /usr/bin/mpiexec and not some
>> other version of it
>
> ```
> $ which mpiexec
> /usr/bin/mpiexec
> ```
> Is this what you mean?
>
> –Nico
>
> On Thu, Sep 18, 2014 at 1:04 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> You
Can you please tell us what version of OMPI you are using?
On Sep 21, 2014, at 6:08 AM, Lee-Ping Wang wrote:
> Hi there,
>
> I’m running into an issue where mpirun isn’t terminating when my executable
> has a nonzero exit status – instead it’s hanging indefinitely.
:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Sunday, September 21, 2014 6:56 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Process is hanging
>
> Can you please tell us what version of OMPI you are using?
>
>
> On Sep 21, 2014, at 6:08 AM, Lee
gt; existing.
>
> Thanks,
>
> - Lee-Ping
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Sunday, September 21, 2014 8:54 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Process is hanging
>
> Just to be clear
Lee-Ping
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Sunday, September 21, 2014 11:49 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Process is hanging
>
> Thanks - I asked because the output you sent shows a bunch of se
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, September 22, 2014 8:09 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Process is hanging
>
> Could you try using the nightly 1.8 tarball? I know there was a problem
FWIW: that warning has been removed from the upcoming 1.8.3 release
On Sep 23, 2014, at 11:45 AM, Reuti wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Am 23.09.2014 um 19:53 schrieb Brock Palen:
>
>> I found a fun head scratcher, with openmpi 1.8.2
gt;> Le 2014-09-23 15:05, Brock Palen a écrit :
>>> Yes the request to torque was procs=64,
>>>
>>> We are using cpusets.
>>>
>>> the mpirun without -np 64 creates 64 spawned hostnames.
>>>
>>> Brock Palen
>>&g
No, it doesn't matter at all for OMPI - any order is fine. The issue I see is
that your mpiexec isn't the OMPI one, but is from someone else. I have no idea
whose mpiexec you are using
On Sep 24, 2014, at 6:38 PM, XingFENG wrote:
> I have found the solution. The
ocumentation claims
> that two mpi are installed, namely, OpenMPI and MPICH2.
>
> On Thu, Sep 25, 2014 at 11:45 AM, Ralph Castain <r...@open-mpi.org> wrote:
> No, it doesn't matter at all for OMPI - any order is fine. The issue I see is
> that your mpiexec isn't the OMPI on
CH version.
On Sep 25, 2014, at 4:33 AM, XingFENG <xingf...@cse.unsw.edu.au> wrote:
> It returns /usr/bin/mpiexec.
>
> On Thu, Sep 25, 2014 at 8:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Do "which mpiexec" and look at the path. The options you show are f
Can you pass us the actual mpirun command line being executed? Especially need
to see the argv being passed to your application.
On Sep 27, 2014, at 7:09 PM, Amos Anderson wrote:
> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. Also,
> I
I'm not seeing this with 1.8.3 - can you try with it?
On Sep 17, 2014, at 4:38 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Yeah, just wanted to make sure you were seeing the same mpiexec in both
> cases. There shouldn't be any issue with providing the complete path, though
led with OpenMPI > 1.6 ?
>
>
>
> On Sep 29, 2014, at 10:28 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Can you pass us the actual mpirun command line being executed? Especially
>> need to see the argv being passed to your application.
>>
&
tested your scenario.
On Sep 29, 2014, at 10:55 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? Just
> trying to fully understand the scenario
>
> Singletons are certainly allowed, if that's the scenario
&g
st/regression/regression-jobs"
> (gdb) print argv[2]
> $13 = 0x20
> (gdb)
>
>
>
>
> On Sep 29, 2014, at 11:48 AM, Dave Goodell (dgoodell) <dgood...@cisco.com>
> wrote:
>
>> Looks like boost::mpi and/or your python "mpi" module might be
I don't know anything about your application, or what the functions in your
code are doing. I imagine it's possible that you are trying to open statically
defined ports, which means that running the job again too soon could leave the
OS thinking the socket is already busy. It takes awhile for
ompi_info is just the first time when an executable is built, and so it always
is the place where we find missing library issues. It looks like someone has
left incorrect configure logic in the system such that we always attempt to
build Infiniband-related code, but without linking against the
Don't know about the segfault itself, but I did find and fix the classpath
logic so the app is found. Might help you get a little further.
On Sep 29, 2014, at 10:58 PM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.9a1r32807 on my
to, and if that code knows how to handle arbitrary connections
You might check about those warnings - could be that QCLOCALSCR and QCREF need
to be set for the code to work.
>
> - Lee-Ping
>
> On Sep 29, 2014, at 8:45 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>&
four different
> clusters (where I don't set these environment variables either), it's only
> broken on the Blue Waters compute node. Also, the calculation runs without
> any problems the first time it's executed on the BW compute node - it's only
> subsequent executions that give the error
HmmmI would guess you should talk to the Hadoop folks as the problem seems
to be a conflict between valgrind and HDFS. Does valgrind even support Java
programs? I honestly have never tried to do that before.
On Oct 2, 2014, at 4:40 AM, XingFENG wrote:
> Hi
We've talked about this as lot over the last few weeks, trying to come up with
some way to maintain the Solaris support - but have come up empty. None of us
have access to such a system, and it appears to be very difficult to avoid
regularly breaking it. I may, as time permits, try playing with
I've looked at your patch, and it isn't quite right as it only looks for
libpmi and not libpmi2. We need to look for each of them as we could have
either or both.
I'll poke a bit at this tonight and see if I can make this a little simpler
- the nesting is getting a little deep.
On Mon, Oct 6,
Sorry about delay - was on travel. Yes, that will avoid the issue.
On Oct 10, 2014, at 1:17 PM, Gary Jackson wrote:
>
> To answer my own question:
>
> Configure with --disable-getpwuid.
>
> On 10/10/14, 12:04 AM, Gary Jackson wrote:
>>
>> I'd like to run MPI on a node to
On Oct 14, 2014, at 5:32 PM, Gus Correa wrote:
> Dear Open MPI fans and experts
>
> This is just a note in case other people run into the same problem.
>
> I just built Open MPI 1.8.3.
> As usual I put my old settings on openmpi-mca-params.conf,
> with no further
If you only have one thread doing MPI calls, then single and funneled are
indeed the same. If this is only happening after long run times, I'd suspect
resource exhaustion. You might check your memory footprint to see if you are
running into leak issues (could be in our library as well as your
f Squyres (jsquyres) wrote:
>> We talked off-list -- fixed this on master and just filed
>> https://github.com/open-mpi/ompi-release/pull/33 to get this into the v1.8
>> branch.
>>
>>
>> On Oct 14, 2014, at 7:39 PM, Ralph Castain <r...@open-mpi.org> wr
Add --disable-getpwuid to configure
On Oct 16, 2014, at 12:36 AM, Aurélien Bouteiller wrote:
> I am building trunk on the Cray xc30.
> I get the following warning during link (static link)
> ../../../orte/.libs/libopen-rte.a(session_dir.o): In function
>
to need an update.
> That is probably the first place people look for information
> about runtime features.
> For instance, the process placement examples still
> use deprecated parameters and mpiexec options:
> -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.
On my
FWIW: vader is the default in 1.8
On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote:
> Are you sure you are not using the vader BTL ?
>
> Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem
> initialization info.
>
> The CMA linux system
r the benefit of mere mortals like me
>> who don't share the dark or the bright side of the force,
>> and just need to keep their MPI applications running in production mode,
>> hopefully with Open MPI 1.8,
>> can somebody explain more clearly what "vader" is
parameters and mpiexec options:
> -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.
>
> Thank you,
> Gus Correa
>
> On 10/15/2014 11:10 PM, Ralph Castain wrote:
>>
>> On Oct 15, 2014, at 11:46 AM, Gus Correa <g...@ldeo.columbia.edu
>
point, an undefined symbol reference from another dynamic
>library. --no-as-needed restores the default behaviour.
>
>
>
> --
> Dipl.-Inform. Paul Kapinos - High Performance Computing,
> RWTH Aachen University, IT Center
> Seffenter Weg 23, D 52074 Aa
l model, along with its syntax
> and examples.
Yeah, I need to do that. LAMA was an alternative implementation of the current
map/rank/bind system. It hasn’t been fully maintained since it was introduced,
and so I’m not sure how much of it is functional. I need to create an
equivalent for the c
> On Oct 17, 2014, at 12:06 PM, Gus Correa wrote:
>
> Hi Jeff
>
> Many thanks for looking into this and filing a bug report at 11:16PM!
>
> Thanks to Aurelien, Ralph and Nathan for their help and clarifications
> also.
>
> **
>
> Related suggestion:
>
> Add a note
> On Oct 17, 2014, at 3:37 AM, Marshall Ward wrote:
>
> I currently have a numerical model that, for reasons unknown, requires
> preconnection to avoid hanging on an initial MPI_Allreduce call.
That is indeed odd - it might take a while for all the connections to form,
From your error message, I gather you are not running an MPI program, but
rather an OSHMEM one? Otherwise, I find the message strange as it only would be
emitted from an OSHMEM program.
What version of OMPI are you trying to use?
> On Oct 22, 2014, at 7:12 PM, Vinson Leung
I was able to build and run the trunk without problem on Yosemite with:
gcc (MacPorts gcc49 4.9.1_0) 4.9.1
GNU Fortran (MacPorts gcc49 4.9.1_0) 4.9.1
Will test 1.8 branch now, though I believe the fortran support in 1.8 is
up-to-date
> On Oct 24, 2014, at 6:46 AM, Guillaume Houzeaux
http://lists.gnu.org/archive/html/libtool-patches/2014-09/msg2.html
>
> On Fri, Oct 24, 2014 at 6:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> I was able to build and run the trunk without problem on Yosemite with:
>>
>> gcc (MacPorts gcc49 4.9.1_0)
1 - 100 of 2838 matches
Mail list logo