It also depends on what part of migration interests you - are you wanting to
look at the MPI part of the problem (reconnecting MPI transports, ensuring
messages are not lost, etc.) or the RTE part of the problem (where to restart
processes, detecting failures, etc.)?
On Aug 24, 2011, at 7:04 A
gt;>>
>>> Thanks and regards
>>> Durga
>>>
>>>
>>> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho wrote:
>>>> Srinivas,
>>>>
>>>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>>&g
OMPI has no way of knowing that you will turn the node on at some future point.
All it can do is try to launch the job on the provided node, which fails
because the node doesn't respond.
You'll have to come up with some scheme for telling the node to turn on in
anticipation of starting a job -
On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote:
> On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote:
>> OMPI has no way of knowing that you will turn the node on at some future
>> point. All it can do is try to launch the job on the provided node, which
>> fails bec
t; Documentation and examples at the link below:
> http://osl.iu.edu/research/ft/ompi-cr/examples.php#example-self
>
> -- Josh
>
> On Aug 26, 2011, at 6:17 PM, Ralph Castain wrote:
>
>> FWIW: I'm in the process of porting some code from a branch that allows apps
>&
On Aug 29, 2011, at 5:40 AM, Reuti wrote:
> Am 27.08.2011 um 16:35 schrieb Ralph Castain:
>
>>
>> On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote:
>>
>>> On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote:
>>>> OMPI has no way of knowing
On Aug 30, 2011, at 9:26 AM, John Hearns wrote:
> On 30 August 2011 02:55, Ralph Castain wrote:
>> Instead, all used dynamic requests - i.e., the job that was doing a
>> comm_spawn would request resources at the time of the comm_spawn call. I
>> would pass the requ
Hi Simone
Just to clarify: is your application threaded? Could you please send the OMPI
configure cmd you used?
Adding the debug flags just changes the race condition. Interestingly, those
values only impact the behavior of mpirun, so it looks like the race condition
is occurring there.
On S
On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:
> On 09/06/2011 02:57 PM, Ralph Castain wrote:
>> Hi Simone
>>
>> Just to clarify: is your application threaded? Could you please send the
>> OMPI configure cmd you used?
>
> yes, it is threaded. There a
p 6, 2011, at 1:20 PM, Simone Pellegrini wrote:
> On 09/06/2011 04:58 PM, Ralph Castain wrote:
>> On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:
>>
>>> On 09/06/2011 02:57 PM, Ralph Castain wrote:
>>>> Hi Simone
>>>>
>>>> Just to
On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
> The mpirun command is invoked when the user’s group is ‘set group’ to group
> 650. When the rank 0 process creates files, they have group ownership 650.
> But the user’s login group is group 1040. The child processes that get
> started on o
ce for various reasons).
>
> Ed
>
> From: Ralph Castain [mailto:r...@open-mpi.org]
> Sent: Wednesday, September 07, 2011 8:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by
> mpirun?
>
> On Sep 7, 2011, at 7:38
Hi Kevin
Are you getting those messages from ompi_info? Or from an MPI app (and if so,
what are you doing to get them)?
On Sep 11, 2011, at 5:25 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
> I have recently seen some OpenIB time out errors and see the
> following reported:
>
> * btl_openib_ib_retr
I ask because those are set via MCA param. So ompi_info would show the
"default" if the param isn't set in the environment or param file, but the app
could see something different if you set the param on the mpirun cmd line.
Those are the default values, but it looks like the MCA param is being
The two are synonyms for each other - they resolve to the identical variable,
so there isn't anything different about them.
Not sure what the issue might be, but I would check for a typo - we don't check
that mca params are spelled correctly, nor do we check for params that don't
exist (e.g., b
We don't have anything similar in OMPI. There are fault tolerance modes, but
not like the one you describe.
On Sep 12, 2011, at 5:52 PM, Rob Stewart wrote:
> Hi,
>
> I have implemented a simple fault tolerant ping pong C program with MPI,
> here: http://pastebin.com/7mtmQH2q
>
> MPICH2 offers
They must not be wholly identical, somehow. This is OpenMPI 1.4.3.
>
> Ed
>
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Monday, September 12, 2011 7:43 PM
> To: Open MPI Users
> Subject: EXTERNAL: Re
I believe this is one of those strange cases that can catch us. The problem is
that we still try to use the qrsh launcher - we appear to ignore the
--without-sge configure option (it impacts our ability to read the allocation,
but not the launcher).
Try setting the following:
-mca plm_rsh_disa
On Sep 13, 2011, at 4:15 PM, Reuti wrote:
> Am 14.09.2011 um 00:11 schrieb Ralph Castain:
>
>> I believe this is one of those strange cases that can catch us. The problem
>> is that we still try to use the qrsh launcher - we appear to ignore the
>> --without-sge conf
On Sep 13, 2011, at 4:25 PM, Reuti wrote:
> Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L:
>
>> This version of OpenMPI I am running was built without any guidance
>> regarding SGE in the configure command, but it was built on a system that
>> did not have SGE, so I would presume support is a
Just to clarify: you'll still need to set that variable regardless of
--without-sge or not. The launcher will still use qrsh if it is present and the
SGE envars are around.
On Sep 13, 2011, at 4:25 PM, Blosch, Edwin L wrote:
> Your comment guided me in the right direction, Reuti. And overlapped
esday, September 07, 2011 12:24 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Can you set the gid of the processes created by
>> mpirun?
>>
>> Hi,
>>
>> you mean you change the group id of the user before you submit the job? In
>> GridEng
te that would be the
case, but I figured it was worth a quick try. Sorry I can't be of help.
>
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Wednesday, September 14, 2011 8:15 AM
turned to your previous group ID.
>
>
>
>
> -----Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Wednesday, September 14, 2011 11:33 AM
> To: Open MPI Users
> Subject: Re: [OMPI
lves with a new instance of a shell
>> that newgrp creates. This doesn't happen with sg, so upon exit from a
>> sg command you are returned to your previous group ID.
>>
>>
>>
>>
>> -Original Message-
>> From: users-boun...@o
imilar to newgrp but accepts a command. The
> > command will be executed with the /bin/sh shell. With most shells you
> > may run sg from, you need to enclose multi-word commands in quotes.
> > Another difference between newgrp and sg is that some shells treat
>
uot;username" is used, not the "uid" - right? So it could
> have a different uid/gid on the machines, but with the new feature they must
> be the same. Okay, in a cluster they are most likely unique across all
> machines anyway. But just to note as a side effect.
>
g
>> several fault tolerant modes, including the one you described in your email.
>> If you are interested please contact me directly.
>>
>> Thanks,
>>george.
>>
>>
>> On Sep 12, 2011, at 20:43 , Ralph Castain wrote:
>>
>>> We d
esn't get
exposed to the entire range of environments we support, and so there are
usually problems that need to be ironed out. Using the code in a production
environment before that has occurred is a "use at your own risk" venture.
HTH
Ralph
On Sep 16, 2011, at 8:28 AM,
Nothing to do with us - you call a function "NSLog" that Objective C doesn't
recognize. That isn't an MPI function.
On Sep 18, 2011, at 8:20 PM, Scott Wilcox wrote:
> I have been asked to convert some C++ code using Open MPI to Objective C and
> I am having problems getting a simple Obj C progr
Hmmmperhaps you didn't notice the mpi_preconnect_all option? It does
precisely what you described - it pushes zero-byte messages around a ring to
force all the connections open at MPI_Init.
On Sep 20, 2011, at 3:06 PM, Henderson, Brent wrote:
> I recently had access to a 200+ node Magny Co
e. Anyway, if I get access to another large TCP
> cluster, I’ll give it a try.
>
> Thanks,
>
> brent
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Tuesday, September 20, 2011 4:15 PM
> To: Open MPI U
Ummm...yes, because you are getting the man page for the MPICH mpicc, not ours.
Try setting your manpage path to point to the OMPI install directory.
On Sep 22, 2011, at 1:55 PM, Uday Kumar Reddy B wrote:
> On Fri, Sep 23, 2011 at 1:21 AM, Jeff Squyres wrote:
>> Right: -cc is not an option to
On Sep 22, 2011, at 2:17 PM, Uday Kumar Reddy B wrote:
>
>
> On 09/23/2011 01:33 AM, Ralph Castain wrote:
>> Ummm...yes, because you are getting the man page for the MPICH mpicc, not
>> ours. Try setting your manpage path to point to the OMPI install directory.
>
What version of OMPI are you using? The job should terminate in either case -
what did you do to keep it running after node failure with tcp?
On Sep 23, 2011, at 12:34 PM, Guilherme V wrote:
> Hi,
> I want to know if anybody is having problems with fault tolerant job using
> infiniband. When I
On Sep 23, 2011, at 1:21 PM, Guilherme V wrote:
> I'm using version 1.4.3 and I forgot to tell that I have made a change in the
> orterun.c line 792:
>
> if (ORTE_JOB_STATE_TERMINATED != exit_state) {
> exit(0); /* patch*/
>
I don't see how that change can keep your job
Sigterm should work - what version are you using?
Ralph
Sent from my iPad
On Sep 28, 2011, at 1:40 PM, Xin Tong wrote:
> I am wondering what the proper way of stop a mpirun process and the child
> process it created. I tried to send SIGTERM, it does not respond to it ?
> What kind of signal
That means you have mismatched installations around - one configured as debug,
and one not. They have to match.
Sent from my iPad
On Oct 3, 2011, at 2:44 PM, Phillip Vassenkov
wrote:
> I went into the directory that I used to install 1.4.3, did the following:
> make clean
> ./configure --enab
Looks like a bug - can address next week. Very unusual use of npernode...
Sent from my iPad
On Oct 4, 2011, at 4:55 AM, Andrew Senin wrote:
> Hi all,
>
> I noticed a strange behaviour in 1.5.4 which seems to me as a bug. I'm trying
> to launch 4 ranks on 2 nodes. If I add "-npernode 3 -byno
OMPI always tries to use the lowest numbered address first - just a natural
ordering. You need to tell it to use just the public ones for this topology.
Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info --param oob
tcp" and "ompi_info --param btl tcp" for the exact syntax.
Sent
For one thing, you should check your path settings. The output you got cannot
possibly have come from OMPI 1.4.2. Looks more like an OMPI 1.2 output.
On Oct 10, 2011, at 6:01 PM, Jonathan Bishop wrote:
> Hi,
>
> New to MPI and decided to try OpenMPI out on hello.cpp, but I get the
> following m
First, OMPI does -not- require you to use ipoib.
With that command line, both procs will be running on remotehostip. I don't
believe openib has a loopback interface, so you'll need the shared memory btl
so procs co-located on a node can talk to each other. In other words, you need
-mca btl sm,o
Can't offer much about the qsub job. On the first one, what is your limit on
the number of file descriptors? Could be your sys admin has it too low.
On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote:
> Hello,
> When i try to run the following command i receive the following error when i
Should be plenty for us - does your program consume a lot?
On Oct 14, 2011, at 12:25 PM, Ashwani Kumar Mishra wrote:
> Hi Ralph,
> fs.file-max = 10
> is this ok or less?
>
> Best Regards,
> Ashwani
>
>
> On Fri, Oct 14, 2011 at 11:45 PM, Ralph Castain wrote:
&
Sorry - been occupied. This is normal behavior. As has been discussed on this
list before, OMPI made a design decision to minimize latency. This means we
aggressively poll for connections. Only thing you can do is tell it to yield
the processor when idle so, if something else is trying to run, w
,
> No idea how much this program consumes the numbers of file descriptors :(
>
> Best Regards,
> Ashwani
>
> On Sat, Oct 15, 2011 at 12:08 AM, Ralph Castain wrote:
> Should be plenty for us - does your program consume a lot?
>
>
> On Oct 14, 2011, at 12:25
On Oct 15, 2011, at 12:25 PM, dave fournier wrote:
> OK, I found that if I inovke the master process
>
> with mpirun as in
>
> mpirun ./orange -master
>
>
> Then the remote process is successful in the MPI_Init call.
> I would like to avoid using mpirun if possible. It seems to
> be
Indeed. However, let me provide this advice. Add --disable-vt to your configure
line, and drop all the rest of those flags. You don't need what you gave as
we'll automatically figure those out. VampirTrace isn't happy on Mac, so
disable it and you should be fine.
On Oct 17, 2011, at 7:29 AM, J
Well, you asked for two processes, and your hostfile indicates that we can run
two procs on each machine. So we put those two procs on the first machine,
leaving nothing for the second machine to do.
If you want the procs on different machines, then add -bynode to the cmd line.
This will put on
I've never seen that error output before - is it coming from your program? It
doesn't match anything from OMPI.
On Oct 19, 2011, at 6:04 AM, Mathieu Westphal wrote:
> Hello
>
> I'm extending a code currently working well on a server with some quadri-core.
>
> But for debugging purpose i want t
I don't think we handle this:
> -H 192.168.4.91 -H 192.168.4.92
You need to have only one -H option - use comma to separate the values
On Oct 19, 2011, at 12:48 PM, ramu wrote:
> Hi,
> I am trying to run osu mpi benchmark tests on Infiniband setup (connected
> back-to-back via Mellanox hw). I
If that is what you are trying to do, mpirun will do it just fine too - it
doesn't have to be an MPI program.
On Oct 19, 2011, at 3:37 PM, Gus Correa wrote:
> Jorge
>
> Besides what Reuti and Eugene said, in case what you're looking for
> is a mechanism to launch several copies of a
> serial [n
On Oct 20, 2011, at 10:33 AM, Jorge Jaramillo wrote:
> Thanks for all your suggestions.
>
> Yes, indeed what I'm trying to do is execute a serial program. All the
> documentation you mention was pretty useful.
> I have another question, if mpirun launches several copies of the program on
> the
Does the difference persist if you run the single process using mpirun? In
other words, does "mpirun -np 1 ./my_hybrid_app..." behave the same as "mpirun
-np 2 ./..."?
There is a slight difference in the way procs start when run as singletons. It
shouldn't make a difference here, but worth test
;mpirun -np 2 ./...".
>
> Run "mpirun -np 1 ./my_hybrid_app..." will increase the performance with more
> number of threads, but run "mpirun -np 2 ./..." decrease the performance.
>
> --
> Huiwei Lv
>
> On Tue, Oct 25, 2011 at 12:00 AM, wrote:
I still see it failing the test George provided on the trunk. I'm unaware of
anyone looking further into it, though, as the prior discussion seemed to just
end.
On Oct 25, 2011, at 7:01 AM, orel wrote:
> Dears,
>
> I try from several days to use advanced MPI2 features in the following
> scena
.4 (kernel
> 2.6.18, gcc 4.1.2) which is similar to the first machine (Cent OS 5.3, kernel
> 2.6.18, gcc 4.1.2). Then the problem disappears. So the problem must lies
> somewhere in OS kernel or GCC version. Any suggestions? Thanks.
>
> --
> Huiwei Lv
>
> On Tue, O
Looks like you are crashing in wrf - have you asked them for help?
On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
> Hi again,
>
> This is exactly the error I have:
>
>
> taskid: 0 hostname: part034.u-bourgogne.fr
> [part034:21443] *** Process received signal ***
> [part034:21443
FWIW: I have tracked this problem down. The fix is a little more complicated
then I'd like, so I'm going to have to ping some other folks to ensure we
concur on the approach before doing something.
On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote:
> I still see it failing th
Did the version you are running get installed in /usr? Sounds like you are
picking up a different version when running a command - i.e., that your PATH is
finding a different installation than the one in /usr.
On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote:
> I need to change system wide how
I'm pretty sure cuda support was never moved to the 1.4 series. You will,
however, find it in the 1.5 series. I suggest you get the latest tarball from
there.
On Oct 27, 2011, at 12:38 PM, Peter Wells wrote:
>
> I am attempting to configure OpenMPI 1.4.3 with cuda support on a Redhat 5
> bo
You might want to send this to the MPICH mailing lists - this is for Open MPI
issues.
On Oct 27, 2011, at 4:59 PM, Jonathan Bishop wrote:
> I am using MPI_Comm_spawn to dynamically run workers. However, when the
> workers exit they get hung up on MPI_Finalize. Here is a short program which
> s
On Oct 28, 2011, at 11:16 AM, Saurabh T wrote:
>
> Hi,
>
> If I use "orterun -H " and does not belong in the default
> hostfile ("etc/openmpi-default-hostfile"), openmpi gives an error. Is there
> an easy way to get the aforementioned command to work without specifying a
> different hostfile
How are you running the job without mpirun? Is this under slurm or some other
RM?
On Oct 31, 2011, at 9:46 AM, Weston, Stephen wrote:
> Hello,
>
> I'm seeing an error on one of our clusters when executing the
> MPI_Init function in a program that is _not_ invoked using the
> mpirun command. T
to ignore the psm
interface to those cards by either configuring it out (--without-psm) or at run
time by setting the envar OMPI_MCA_mtl=^psm
>
> - Steve
>
>
> ____
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf o
On Nov 3, 2011, at 8:54 AM, Blosch, Edwin L wrote:
> Can anyone guess what the problem is here? I was under the impression that
> OpenMPI (1.4.4) would look for /tmp and would create its shared-memory
> backing file there, i.e. if you don’t set orte_tmpdir_base to anything.
That is correct
>
I'm afraid this isn't correct. You definitely don't want the session directory
in /dev/shm as this will almost always cause problems.
We look thru a progression of envars to find where to put the session directory:
1. the MCA param orte_tmpdir_base
2. the envar OMPI_PREFIX_ENV
3. the envar TMP
Couple of things:
1. Check the configure cmd line you gave - OMPI thinks your local computer
should have an openib support that isn't correct.
2. did you recompile your app on your local computer, using the version of OMPI
built/installed there?
On Nov 3, 2011, at 10:10 AM, amine mrabet wrote
On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:
> I might be missing something here. Is there a side-effect or performance loss
> if you don't use the sm btl? Why would it exist if there is a wholly
> equivalent alternative? What happens to traffic that is intended for another
> process o
ollow-up-questions, maybe this starts to
>>> go outside OpenMPI:
>>>>
>>>> What's wrong with using /dev/shm? I think you said earlier in this thread
>>> that this was not a safe place.
>>>>
>>>> If the NFS-mount point is moved
follow-up-questions, maybe this starts to
>> go outside OpenMPI:
>>>
>>> What's wrong with using /dev/shm? I think you said earlier in this thread
>> that this was not a safe place.
>>>
>>> If the NFS-mount point is moved from /tmp to /work,
Just glancing at the output, it appears to be finding a different gcc that
isn't Lion compatible. I know people have been forgetting to clear out all
their old installed software, and so you can pick old things up.
Try setting your path and ld_library_path variables to point at the Xcode gcc.
Where did you install OMPI? If you check "which mpirun", does it point to the
same installation where you edited the default hostfile?
On Nov 6, 2011, at 6:16 PM, Lukas Razik wrote:
> Hello together!
>
> I've built v1.4.3 (which was in OFED-1.5.3.2) and v1.4.4 (from you website).
> But in both
The problem is that the prefix you configured with doesn't match the prefix you
are providing:
configure: prefix = /opt/openmpi-1.4.4
running: prefix = /Network/opt/openmpi-1.4.4
The two have to match in order for the libraries to be found.
On Nov 8, 2011, at 6:01 AM, Christophe Peyret wrot
I'm not sure where the FAQ got its information, but it has always been one
param per -x option.
I'm afraid there isn't any envar to support the setting of multiple -x options.
We didn't expect someone to forward very many, if any, so we didn't create that
capability. It wouldn't be too hard to
I'm not sure what you mean by "migrate". Are you talking about restarting a
failed process at a different location? Or arbitrarily moving a process to
another location upon command?
On Nov 10, 2011, at 5:18 AM, Mudassar Majeed wrote:
>
> Dear MPI community,
>
ate a process from one core to another
> or not. Then I will see how good my heuristic will be.
>
> thanks
> Mudassar
>
> From: Jeff Squyres
> To: Mudassar Majeed ; Open MPI Users
>
> Cc: Ralph Castain
> Sent: Thursday, November 10, 2011 2:19 PM
> Subject: Re: [O
Hmmm...it -should- work, but I've never tried it on Windows. I will verify it
under Linux, but will have to defer to Shiqing to see if there is something
particular about the Windows environment.
On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote:
> I have open-mpi v1.5.4, installed from the b
On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote:
> So there is no solution? e.g. my 2 computers that are computing nodes
> and are placed in different room on different floors. And the target
> user wants to monitor the progress of computation independently which
> have to be printed on thei
he screen where mpirun resides and putting the output from that rank
there.
However, there is NO option for redirecting the output from your MPI processes
to anywhere other than the screen where mpirun is executing.
>
> 2011/11/14 Ralph Castain :
>>
>> On Nov 14, 2011, at
Yes, this is well documented - may be on the FAQ, but certainly has been in the
user list multiple times.
The problem is that one process falls behind, which causes it to begin
accumulating "unexpected messages" in its queue. This causes the matching logic
to run a little slower, thus making th
No real ideas, I'm afraid. We regularly launch much larger jobs than that using
ssh without problem, so it is likely something about the local setup of that
node that is causing the problem. Offhand, it sounds like either the mapper
isn't getting things right, or for some reason the daemon on 00
On Nov 22, 2011, at 10:10 AM, Paul Kapinos wrote:
> Hello Ralph, hello all.
>
>> No real ideas, I'm afraid. We regularly launch much larger jobs than that
>> using ssh without problem,
> I was also able to run a 288-node-job yesterday - the size alone is not the
> problem...
>
>
>
>> so it
Yes, that would indeed break things. The 1.5 series isn't correctly checking
connections across multiple interfaces until it finds one that works - it just
uses the first one it sees. :-(
The solution is to specify -mca oob_tcp_if_include ib0. This will direct the
run-time wireup across the IP
On Nov 24, 2011, at 11:49 AM, Paul Kapinos wrote:
> Hello Ralph, Terry, all!
>
> again, two news: the good one and the second one.
>
> Ralph Castain wrote:
>> Yes, that would indeed break things. The 1.5 series isn't correctly checking
>> connections across mu
Hi Markus
You have some major problems with confused installations of MPIs. First, you
cannot compile an application against MPICH and expect to run it with OMPI -
the two are not binary compatible. You need to compile against the MPI
installation you intend to run against.
Second, your errors
On Nov 24, 2011, at 2:00 AM, Reuti wrote:
> Hi,
>
> Am 24.11.2011 um 05:26 schrieb Jaison Paul:
>
>> I am trying to access OpenMPI processes over Internet using ssh and not
>> quite successful, yet. I believe that I should be able to do it.
>>
>> I have to run one process on my PC and the res
On Nov 25, 2011, at 3:42 AM, Reuti wrote:
> Hi Ralph,
>
> Am 25.11.2011 um 03:47 schrieb Ralph Castain:
>
>>
>> On Nov 24, 2011, at 2:00 AM, Reuti wrote:
>>
>>> Hi,
>>>
>>> Am 24.11.2011 um 05:26 schrieb Jaison Paul:
>>>
&g
On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote:
> Hello again,
>
>>> Ralph Castain wrote:
>>>> Yes, that would indeed break things. The 1.5 series isn't correctly
>>>> checking connections across multiple interfaces until it finds one that
>
-------
> [linux-6wa6:05565] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> orterun.c at line 543
>
>
>
> What can i do with this?
> Thx,
> Markus
>
> On 11/25/2011 03:42 AM, Ralph Castain wrote:
>&
ally, 2 processes returned
> non-zero exit codes.. Further examination may be required.
> ---
> [14:38] svbu-mpi:~ %
>
> (I did not read this thread too carefully, so perhaps I missed an inference
> in here somewhere...)
>
>
>
>
>
> On Nov
On Nov 28, 2011, at 5:32 PM, Jeff Squyres wrote:
> On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote:
> Right-o. Knew there was something I forgot...
>
>> So on rsh, we do not put envar mca params onto the orted cmd line. This has
>> been noted repeatedly on the user a
If you're interested, see
>
> http://norbl.com/ppe-ompi.html
>
>
> Barnet Wagman
On Nov 30, 2011, at 4:03 AM, Jaison Paul wrote:
>
> Ralph Castain open-mpi.org> writes:
>
>>
>>
>> On Nov 24, 2011, at 2:00 AM, Reuti wrote:
>>
>
the default value (1000), so our
>> application seems to be a pretty extreme case.
>>
>> T. Rosmond
>>
>>
>> On Mon, 2011-11-14 at 16:17 -0700, Ralph Castain wrote:
>>> Yes, this is well documented - may be on the FAQ, but certainly has been in
Oh - and another one at orte/test/mpi/reduce-hang.c
On Nov 30, 2011, at 11:50 AM, Ralph Castain wrote:
> FWIW: we already have a reproducer from prior work I did chasing this down a
> couple of years ago. See orte/test/mpi/bcast_loop.c
>
>
> On Nov 29, 2011, at 9:35 AM, Jef
PM, Jaison Paul wrote:
> Ralph Castain open-mpi.org> writes:
>
>>
>> This has come up before - I would suggest doing a quick search of "ec2" on
>> our
> user list. Here is one solution:
>> On Jun 14, 2011, at 10:50 AM, Barnet Wagman wrote:I
Hi Igor
As I recall, this eventually traced back to a change in slurm at some point. I
believe the latest interpretation is in line with your suggestion. I believe we
didn't change it because nobody seemed to care very much, but I have no
objection to including it in the next release.
Thanks
r
On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote:
> Hello,
>
> I found the solution, thanks to Qlogic support.
>
> The "can't open /dev/ipath, network down (err=26)" message from the ipath
> driver is really misleading.
>
> Actually, this is an hardware context problem on the Qlogic PSM. PSM
That, or at least something like it, is quite doable - will put it on my list.
On Dec 6, 2011, at 1:28 PM, Paul Kapinos wrote:
> What I asked for is something which could replace
>
> mpiexec -x FOO -x BAR -x FOBA -x BAFO -x RR -x ZZ ..
>
> (which is quite tedious to type and error-prone for th
> case.
> See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php
>
> Thus this issue is not about forwarding some or any OMPI_* envvars to the
> _processes_, but on someone step _before_ (the processes were not started
> correctly at all in my problem case), as Ralp
901 - 1000 of 3066 matches
Mail list logo