Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Ralph Castain
It also depends on what part of migration interests you - are you wanting to look at the MPI part of the problem (reconnecting MPI transports, ensuring messages are not lost, etc.) or the RTE part of the problem (where to restart processes, detecting failures, etc.)? On Aug 24, 2011, at 7:04 A

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-26 Thread Ralph Castain
gt;>> >>> Thanks and regards >>> Durga >>> >>> >>> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho wrote: >>>> Srinivas, >>>> >>>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing - >>&g

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Ralph Castain
OMPI has no way of knowing that you will turn the node on at some future point. All it can do is try to launch the job on the provided node, which fails because the node doesn't respond. You'll have to come up with some scheme for telling the node to turn on in anticipation of starting a job -

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Ralph Castain
On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote: > On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote: >> OMPI has no way of knowing that you will turn the node on at some future >> point. All it can do is try to launch the job on the provided node, which >> fails bec

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Ralph Castain
t; Documentation and examples at the link below: > http://osl.iu.edu/research/ft/ompi-cr/examples.php#example-self > > -- Josh > > On Aug 26, 2011, at 6:17 PM, Ralph Castain wrote: > >> FWIW: I'm in the process of porting some code from a branch that allows apps >&

Re: [OMPI users] How to add nodes while running job

2011-08-29 Thread Ralph Castain
On Aug 29, 2011, at 5:40 AM, Reuti wrote: > Am 27.08.2011 um 16:35 schrieb Ralph Castain: > >> >> On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote: >> >>> On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote: >>>> OMPI has no way of knowing

Re: [OMPI users] How to add nodes while running job

2011-08-30 Thread Ralph Castain
On Aug 30, 2011, at 9:26 AM, John Hearns wrote: > On 30 August 2011 02:55, Ralph Castain wrote: >> Instead, all used dynamic requests - i.e., the job that was doing a >> comm_spawn would request resources at the time of the comm_spawn call. I >> would pass the requ

Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"

2011-09-06 Thread Ralph Castain
Hi Simone Just to clarify: is your application threaded? Could you please send the OMPI configure cmd you used? Adding the debug flags just changes the race condition. Interestingly, those values only impact the behavior of mpirun, so it looks like the race condition is occurring there. On S

Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"

2011-09-06 Thread Ralph Castain
On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote: > On 09/06/2011 02:57 PM, Ralph Castain wrote: >> Hi Simone >> >> Just to clarify: is your application threaded? Could you please send the >> OMPI configure cmd you used? > > yes, it is threaded. There a

Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"

2011-09-06 Thread Ralph Castain
p 6, 2011, at 1:20 PM, Simone Pellegrini wrote: > On 09/06/2011 04:58 PM, Ralph Castain wrote: >> On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote: >> >>> On 09/06/2011 02:57 PM, Ralph Castain wrote: >>>> Hi Simone >>>> >>>> Just to

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ralph Castain
On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote: > The mpirun command is invoked when the user’s group is ‘set group’ to group > 650. When the rank 0 process creates files, they have group ownership 650. > But the user’s login group is group 1040. The child processes that get > started on o

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ralph Castain
ce for various reasons). > > Ed > > From: Ralph Castain [mailto:r...@open-mpi.org] > Sent: Wednesday, September 07, 2011 8:53 AM > To: Open MPI Users > Subject: Re: [OMPI users] Can you set the gid of the processes created by > mpirun? > > On Sep 7, 2011, at 7:38

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-11 Thread Ralph Castain
Hi Kevin Are you getting those messages from ompi_info? Or from an MPI app (and if so, what are you doing to get them)? On Sep 11, 2011, at 5:25 PM, kevin.buck...@ecs.vuw.ac.nz wrote: > I have recently seen some OpenIB time out errors and see the > following reported: > > * btl_openib_ib_retr

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-12 Thread Ralph Castain
I ask because those are set via MCA param. So ompi_info would show the "default" if the param isn't set in the environment or param file, but the app could see something different if you set the param on the mpirun cmd line. Those are the default values, but it looks like the MCA param is being

Re: [OMPI users] Question on using rsh

2011-09-12 Thread Ralph Castain
The two are synonyms for each other - they resolve to the identical variable, so there isn't anything different about them. Not sure what the issue might be, but I would check for a typo - we don't check that mca params are spelled correctly, nor do we check for params that don't exist (e.g., b

Re: [OMPI users] mpiexec option for node failure

2011-09-12 Thread Ralph Castain
We don't have anything similar in OMPI. There are fault tolerance modes, but not like the one you describe. On Sep 12, 2011, at 5:52 PM, Rob Stewart wrote: > Hi, > > I have implemented a simple fault tolerant ping pong C program with MPI, > here: http://pastebin.com/7mtmQH2q > > MPICH2 offers

Re: [OMPI users] EXTERNAL: Re: Question on using rsh

2011-09-13 Thread Ralph Castain
They must not be wholly identical, somehow. This is OpenMPI 1.4.3. > > Ed > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Monday, September 12, 2011 7:43 PM > To: Open MPI Users > Subject: EXTERNAL: Re

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
I believe this is one of those strange cases that can catch us. The problem is that we still try to use the qrsh launcher - we appear to ignore the --without-sge configure option (it impacts our ability to read the allocation, but not the launcher). Try setting the following: -mca plm_rsh_disa

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
On Sep 13, 2011, at 4:15 PM, Reuti wrote: > Am 14.09.2011 um 00:11 schrieb Ralph Castain: > >> I believe this is one of those strange cases that can catch us. The problem >> is that we still try to use the qrsh launcher - we appear to ignore the >> --without-sge conf

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
On Sep 13, 2011, at 4:25 PM, Reuti wrote: > Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L: > >> This version of OpenMPI I am running was built without any guidance >> regarding SGE in the configure command, but it was built on a system that >> did not have SGE, so I would presume support is a

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
Just to clarify: you'll still need to set that variable regardless of --without-sge or not. The launcher will still use qrsh if it is present and the SGE envars are around. On Sep 13, 2011, at 4:25 PM, Blosch, Edwin L wrote: > Your comment guided me in the right direction, Reuti. And overlapped

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Ralph Castain
esday, September 07, 2011 12:24 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] Can you set the gid of the processes created by >> mpirun? >> >> Hi, >> >> you mean you change the group id of the user before you submit the job? In >> GridEng

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Ralph Castain
te that would be the case, but I figured it was worth a quick try. Sorry I can't be of help. > > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, September 14, 2011 8:15 AM

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Ralph Castain
turned to your previous group ID. > > > > > -----Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, September 14, 2011 11:33 AM > To: Open MPI Users > Subject: Re: [OMPI

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Ralph Castain
lves with a new instance of a shell >> that newgrp creates. This doesn't happen with sg, so upon exit from a >> sg command you are returned to your previous group ID. >> >> >> >> >> -Original Message- >> From: users-boun...@o

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Ralph Castain
imilar to newgrp but accepts a command. The > > command will be executed with the /bin/sh shell. With most shells you > > may run sg from, you need to enclose multi-word commands in quotes. > > Another difference between newgrp and sg is that some shells treat >

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-15 Thread Ralph Castain
uot;username" is used, not the "uid" - right? So it could > have a different uid/gid on the machines, but with the new feature they must > be the same. Okay, in a cluster they are most likely unique across all > machines anyway. But just to note as a side effect. >

Re: [OMPI users] mpiexec option for node failure

2011-09-16 Thread Ralph Castain
g >> several fault tolerant modes, including the one you described in your email. >> If you are interested please contact me directly. >> >> Thanks, >>george. >> >> >> On Sep 12, 2011, at 20:43 , Ralph Castain wrote: >> >>> We d

Re: [OMPI users] mpiexec option for node failure

2011-09-18 Thread Ralph Castain
esn't get exposed to the entire range of environments we support, and so there are usually problems that need to be ironed out. Using the code in a production environment before that has occurred is a "use at your own risk" venture. HTH Ralph On Sep 16, 2011, at 8:28 AM,

Re: [OMPI users] Open MPI and Objective C

2011-09-19 Thread Ralph Castain
Nothing to do with us - you call a function "NSLog" that Objective C doesn't recognize. That isn't an MPI function. On Sep 18, 2011, at 8:20 PM, Scott Wilcox wrote: > I have been asked to convert some C++ code using Open MPI to Objective C and > I am having problems getting a simple Obj C progr

Re: [OMPI users] Large TCP cluster timeout issue

2011-09-20 Thread Ralph Castain
Hmmmperhaps you didn't notice the mpi_preconnect_all option? It does precisely what you described - it pushes zero-byte messages around a ring to force all the connections open at MPI_Init. On Sep 20, 2011, at 3:06 PM, Henderson, Brent wrote: > I recently had access to a 200+ node Magny Co

Re: [OMPI users] Large TCP cluster timeout issue

2011-09-20 Thread Ralph Castain
e. Anyway, if I get access to another large TCP > cluster, I’ll give it a try. > > Thanks, > > brent > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Tuesday, September 20, 2011 4:15 PM > To: Open MPI U

Re: [OMPI users] openmpi -cc= option

2011-09-22 Thread Ralph Castain
Ummm...yes, because you are getting the man page for the MPICH mpicc, not ours. Try setting your manpage path to point to the OMPI install directory. On Sep 22, 2011, at 1:55 PM, Uday Kumar Reddy B wrote: > On Fri, Sep 23, 2011 at 1:21 AM, Jeff Squyres wrote: >> Right: -cc is not an option to

Re: [OMPI users] openmpi -cc= option

2011-09-22 Thread Ralph Castain
On Sep 22, 2011, at 2:17 PM, Uday Kumar Reddy B wrote: > > > On 09/23/2011 01:33 AM, Ralph Castain wrote: >> Ummm...yes, because you are getting the man page for the MPICH mpicc, not >> ours. Try setting your manpage path to point to the OMPI install directory. >

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Ralph Castain
What version of OMPI are you using? The job should terminate in either case - what did you do to keep it running after node failure with tcp? On Sep 23, 2011, at 12:34 PM, Guilherme V wrote: > Hi, > I want to know if anybody is having problems with fault tolerant job using > infiniband. When I

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Ralph Castain
On Sep 23, 2011, at 1:21 PM, Guilherme V wrote: > I'm using version 1.4.3 and I forgot to tell that I have made a change in the > orterun.c line 792: > > if (ORTE_JOB_STATE_TERMINATED != exit_state) { > exit(0); /* patch*/ > I don't see how that change can keep your job

Re: [OMPI users] Proper way to stop MPI process

2011-09-30 Thread Ralph Castain
Sigterm should work - what version are you using? Ralph Sent from my iPad On Sep 28, 2011, at 1:40 PM, Xin Tong wrote: > I am wondering what the proper way of stop a mpirun process and the child > process it created. I tried to send SIGTERM, it does not respond to it ? > What kind of signal

Re: [OMPI users] Segfault on any MPI communication on head node

2011-10-03 Thread Ralph Castain
That means you have mismatched installations around - one configured as debug, and one not. They have to match. Sent from my iPad On Oct 3, 2011, at 2:44 PM, Phillip Vassenkov wrote: > I went into the directory that I used to install 1.4.3, did the following: > make clean > ./configure --enab

Re: [OMPI users] Non-continous ranks with "--np 4 -npernode 3 -bynode"

2011-10-04 Thread Ralph Castain
Looks like a bug - can address next week. Very unusual use of npernode... Sent from my iPad On Oct 4, 2011, at 4:55 AM, Andrew Senin wrote: > Hi all, > > I noticed a strange behaviour in 1.5.4 which seems to me as a bug. I'm trying > to launch 4 ranks on 2 nodes. If I add "-npernode 3 -byno

Re: [OMPI users] Private and public IP mixing.

2011-10-04 Thread Ralph Castain
OMPI always tries to use the lowest numbered address first - just a natural ordering. You need to tell it to use just the public ones for this topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax. Sent

Re: [OMPI users] can not get hello.cpp to run...

2011-10-10 Thread Ralph Castain
For one thing, you should check your path settings. The output you got cannot possibly have come from OMPI 1.4.2. Looks more like an OMPI 1.2 output. On Oct 10, 2011, at 6:01 PM, Jonathan Bishop wrote: > Hi, > > New to MPI and decided to try OpenMPI out on hello.cpp, but I get the > following m

Re: [OMPI users] How to run open MPI without ipoib

2011-10-13 Thread Ralph Castain
First, OMPI does -not- require you to use ipoib. With that command line, both procs will be running on remotehostip. I don't believe openib has a loopback interface, so you'll need the shared memory btl so procs co-located on a node can talk to each other. In other words, you need -mca btl sm,o

Re: [OMPI users] Error when using more than 88 processors for a specific executable -Abyss

2011-10-14 Thread Ralph Castain
Can't offer much about the qsub job. On the first one, what is your limit on the number of file descriptors? Could be your sys admin has it too low. On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote: > Hello, > When i try to run the following command i receive the following error when i

Re: [OMPI users] Error when using more than 88 processors for a specific executable -Abyss

2011-10-14 Thread Ralph Castain
Should be plenty for us - does your program consume a lot? On Oct 14, 2011, at 12:25 PM, Ashwani Kumar Mishra wrote: > Hi Ralph, > fs.file-max = 10 > is this ok or less? > > Best Regards, > Ashwani > > > On Fri, Oct 14, 2011 at 11:45 PM, Ralph Castain wrote: &

Re: [OMPI users] MPI_Comm_accept - Busy wait

2011-10-14 Thread Ralph Castain
Sorry - been occupied. This is normal behavior. As has been discussed on this list before, OMPI made a design decision to minimize latency. This means we aggressively poll for connections. Only thing you can do is tell it to yield the processor when idle so, if something else is trying to run, w

Re: [OMPI users] Error when using more than 88 processors for a specific executable -Abyss

2011-10-15 Thread Ralph Castain
, > No idea how much this program consumes the numbers of file descriptors :( > > Best Regards, > Ashwani > > On Sat, Oct 15, 2011 at 12:08 AM, Ralph Castain wrote: > Should be plenty for us - does your program consume a lot? > > > On Oct 14, 2011, at 12:25

Re: [OMPI users] remote spawned process hangs at MPI_Init

2011-10-15 Thread Ralph Castain
On Oct 15, 2011, at 12:25 PM, dave fournier wrote: > OK, I found that if I inovke the master process > > with mpirun as in > > mpirun ./orange -master > > > Then the remote process is successful in the MPI_Init call. > I would like to avoid using mpirun if possible. It seems to > be

Re: [OMPI users] [OMPI docs] Open MPI compilation Error

2011-10-17 Thread Ralph Castain
Indeed. However, let me provide this advice. Add --disable-vt to your configure line, and drop all the rest of those flags. You don't need what you gave as we'll automatically figure those out. VampirTrace isn't happy on Mac, so disable it and you should be fine. On Oct 17, 2011, at 7:29 AM, J

Re: [OMPI users] Running MPI program using dropbear

2011-10-19 Thread Ralph Castain
Well, you asked for two processes, and your hostfile indicates that we can run two procs on each machine. So we put those two procs on the first machine, leaving nothing for the second machine to do. If you want the procs on different machines, then add -bynode to the cmd line. This will put on

Re: [OMPI users] ERROR: too many MPI processes

2011-10-19 Thread Ralph Castain
I've never seen that error output before - is it coming from your program? It doesn't match anything from OMPI. On Oct 19, 2011, at 6:04 AM, Mathieu Westphal wrote: > Hello > > I'm extending a code currently working well on a server with some quadri-core. > > But for debugging purpose i want t

Re: [OMPI users] running osu mpi benchmark tests on Infiniband setup

2011-10-19 Thread Ralph Castain
I don't think we handle this: > -H 192.168.4.91 -H 192.168.4.92 You need to have only one -H option - use comma to separate the values On Oct 19, 2011, at 12:48 PM, ramu wrote: > Hi, > I am trying to run osu mpi benchmark tests on Infiniband setup (connected > back-to-back via Mellanox hw). I

Re: [OMPI users] Application in a cluster

2011-10-19 Thread Ralph Castain
If that is what you are trying to do, mpirun will do it just fine too - it doesn't have to be an MPI program. On Oct 19, 2011, at 3:37 PM, Gus Correa wrote: > Jorge > > Besides what Reuti and Eugene said, in case what you're looking for > is a mechanism to launch several copies of a > serial [n

Re: [OMPI users] Application in a cluster

2011-10-20 Thread Ralph Castain
On Oct 20, 2011, at 10:33 AM, Jorge Jaramillo wrote: > Thanks for all your suggestions. > > Yes, indeed what I'm trying to do is execute a serial program. All the > documentation you mention was pretty useful. > I have another question, if mpirun launches several copies of the program on > the

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-24 Thread Ralph Castain
Does the difference persist if you run the single process using mpirun? In other words, does "mpirun -np 1 ./my_hybrid_app..." behave the same as "mpirun -np 2 ./..."? There is a slight difference in the way procs start when run as singletons. It shouldn't make a difference here, but worth test

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-25 Thread Ralph Castain
;mpirun -np 2 ./...". > > Run "mpirun -np 1 ./my_hybrid_app..." will increase the performance with more > number of threads, but run "mpirun -np 2 ./..." decrease the performance. > > -- > Huiwei Lv > > On Tue, Oct 25, 2011 at 12:00 AM, wrote:

Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()

2011-10-25 Thread Ralph Castain
I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end. On Oct 25, 2011, at 7:01 AM, orel wrote: > Dears, > > I try from several days to use advanced MPI2 features in the following > scena

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-25 Thread Ralph Castain
.4 (kernel > 2.6.18, gcc 4.1.2) which is similar to the first machine (Cent OS 5.3, kernel > 2.6.18, gcc 4.1.2). Then the problem disappears. So the problem must lies > somewhere in OS kernel or GCC version. Any suggestions? Thanks. > > -- > Huiwei Lv > > On Tue, O

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Ralph Castain
Looks like you are crashing in wrf - have you asked them for help? On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote: > Hi again, > > This is exactly the error I have: > > > taskid: 0 hostname: part034.u-bourgogne.fr > [part034:21443] *** Process received signal *** > [part034:21443

Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()

2011-10-25 Thread Ralph Castain
FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something. On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote: > I still see it failing th

Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread Ralph Castain
Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation than the one in /usr. On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote: > I need to change system wide how

Re: [OMPI users] configure with cuda

2011-10-27 Thread Ralph Castain
I'm pretty sure cuda support was never moved to the 1.4 series. You will, however, find it in the 1.5 series. I suggest you get the latest tarball from there. On Oct 27, 2011, at 12:38 PM, Peter Wells wrote: > > I am attempting to configure OpenMPI 1.4.3 with cuda support on a Redhat 5 > bo

Re: [OMPI users] Spawned process do not shut down...

2011-10-27 Thread Ralph Castain
You might want to send this to the MPICH mailing lists - this is for Open MPI issues. On Oct 27, 2011, at 4:59 PM, Jonathan Bishop wrote: > I am using MPI_Comm_spawn to dynamically run workers. However, when the > workers exit they get hung up on MPI_Finalize. Here is a short program which > s

Re: [OMPI users] How to override default hostfile to specify host

2011-10-28 Thread Ralph Castain
On Oct 28, 2011, at 11:16 AM, Saurabh T wrote: > > Hi, > > If I use "orterun -H " and does not belong in the default > hostfile ("etc/openmpi-default-hostfile"), openmpi gives an error. Is there > an easy way to get the aforementioned command to work without specifying a > different hostfile

Re: [OMPI users] Error when calling MPI_Init

2011-10-31 Thread Ralph Castain
How are you running the job without mpirun? Is this under slurm or some other RM? On Oct 31, 2011, at 9:46 AM, Weston, Stephen wrote: > Hello, > > I'm seeing an error on one of our clusters when executing the > MPI_Init function in a program that is _not_ invoked using the > mpirun command. T

Re: [OMPI users] Error when calling MPI_Init

2011-10-31 Thread Ralph Castain
to ignore the psm interface to those cards by either configuring it out (--without-psm) or at run time by setting the envar OMPI_MCA_mtl=^psm > > - Steve > > > ____ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf o

Re: [OMPI users] Shared-memory problems

2011-11-03 Thread Ralph Castain
On Nov 3, 2011, at 8:54 AM, Blosch, Edwin L wrote: > Can anyone guess what the problem is here? I was under the impression that > OpenMPI (1.4.4) would look for /tmp and would create its shared-memory > backing file there, i.e. if you don’t set orte_tmpdir_base to anything. That is correct >

Re: [OMPI users] Shared-memory problems

2011-11-03 Thread Ralph Castain
I'm afraid this isn't correct. You definitely don't want the session directory in /dev/shm as this will almost always cause problems. We look thru a progression of envars to find where to put the session directory: 1. the MCA param orte_tmpdir_base 2. the envar OMPI_PREFIX_ENV 3. the envar TMP

Re: [OMPI users] problem with mpirun

2011-11-03 Thread Ralph Castain
Couple of things: 1. Check the configure cmd line you gave - OMPI thinks your local computer should have an openib support that isn't correct. 2. did you recompile your app on your local computer, using the version of OMPI built/installed there? On Nov 3, 2011, at 10:10 AM, amine mrabet wrote

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Ralph Castain
On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote: > I might be missing something here. Is there a side-effect or performance loss > if you don't use the sm btl? Why would it exist if there is a wholly > equivalent alternative? What happens to traffic that is intended for another > process o

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Ralph Castain
ollow-up-questions, maybe this starts to >>> go outside OpenMPI: >>>> >>>> What's wrong with using /dev/shm? I think you said earlier in this thread >>> that this was not a safe place. >>>> >>>> If the NFS-mount point is moved

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Ralph Castain
follow-up-questions, maybe this starts to >> go outside OpenMPI: >>> >>> What's wrong with using /dev/shm? I think you said earlier in this thread >> that this was not a safe place. >>> >>> If the NFS-mount point is moved from /tmp to /work,

Re: [OMPI users] MPI on MacOS Lion help

2011-11-04 Thread Ralph Castain
Just glancing at the output, it appears to be finding a different gcc that isn't Lion compatible. I know people have been forgetting to clear out all their old installed software, and so you can pick old things up. Try setting your path and ld_library_path variables to point at the Xcode gcc.

Re: [OMPI users] Problem with openmpi-default-hostfile

2011-11-06 Thread Ralph Castain
Where did you install OMPI? If you check "which mpirun", does it point to the same installation where you edited the default hostfile? On Nov 6, 2011, at 6:16 PM, Lukas Razik wrote: > Hello together! > > I've built v1.4.3 (which was in OFED-1.5.3.2) and v1.4.4 (from you website). > But in both

Re: [OMPI users] OSX: dyld: Symbol not found: _orte_daemon

2011-11-08 Thread Ralph Castain
The problem is that the prefix you configured with doesn't match the prefix you are providing: configure: prefix = /opt/openmpi-1.4.4 running: prefix = /Network/opt/openmpi-1.4.4 The two have to match in order for the libraries to be found. On Nov 8, 2011, at 6:01 AM, Christophe Peyret wrot

Re: [OMPI users] wiki and "man mpirun" odds, and a question

2011-11-10 Thread Ralph Castain
I'm not sure where the FAQ got its information, but it has always been one param per -x option. I'm afraid there isn't any envar to support the setting of multiple -x options. We didn't expect someone to forward very many, if any, so we didn't create that capability. It wouldn't be too hard to

Re: [OMPI users] Process Migration

2011-11-10 Thread Ralph Castain
I'm not sure what you mean by "migrate". Are you talking about restarting a failed process at a different location? Or arbitrarily moving a process to another location upon command? On Nov 10, 2011, at 5:18 AM, Mudassar Majeed wrote: > > Dear MPI community, >

Re: [OMPI users] Process Migration

2011-11-10 Thread Ralph Castain
ate a process from one core to another > or not. Then I will see how good my heuristic will be. > > thanks > Mudassar > > From: Jeff Squyres > To: Mudassar Majeed ; Open MPI Users > > Cc: Ralph Castain > Sent: Thursday, November 10, 2011 2:19 PM > Subject: Re: [O

Re: [OMPI users] Is it not possible to run a program with MPI code without mpirun/mpiexec?

2011-11-14 Thread Ralph Castain
Hmmm...it -should- work, but I've never tried it on Windows. I will verify it under Linux, but will have to defer to Shiqing to see if there is something particular about the Windows environment. On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote: > I have open-mpi v1.5.4, installed from the b

Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Ralph Castain
On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote: > So there is no solution? e.g. my 2 computers that are computing nodes > and are placed in different room on different floors. And the target > user wants to monitor the progress of computation independently which > have to be printed on thei

Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Ralph Castain
he screen where mpirun resides and putting the output from that rank there. However, there is NO option for redirecting the output from your MPI processes to anywhere other than the screen where mpirun is executing. > > 2011/11/14 Ralph Castain : >> >> On Nov 14, 2011, at

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-14 Thread Ralph Castain
Yes, this is well documented - may be on the FAQ, but certainly has been in the user list multiple times. The problem is that one process falls behind, which causes it to begin accumulating "unexpected messages" in its queue. This causes the matching logic to run a little slower, thus making th

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-21 Thread Ralph Castain
No real ideas, I'm afraid. We regularly launch much larger jobs than that using ssh without problem, so it is likely something about the local setup of that node that is causing the problem. Offhand, it sounds like either the mapper isn't getting things right, or for some reason the daemon on 00

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-22 Thread Ralph Castain
On Nov 22, 2011, at 10:10 AM, Paul Kapinos wrote: > Hello Ralph, hello all. > >> No real ideas, I'm afraid. We regularly launch much larger jobs than that >> using ssh without problem, > I was also able to run a 288-node-job yesterday - the size alone is not the > problem... > > > >> so it

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread Ralph Castain
Yes, that would indeed break things. The 1.5 series isn't correctly checking connections across multiple interfaces until it finds one that works - it just uses the first one it sees. :-( The solution is to specify -mca oob_tcp_if_include ib0. This will direct the run-time wireup across the IP

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-24 Thread Ralph Castain
On Nov 24, 2011, at 11:49 AM, Paul Kapinos wrote: > Hello Ralph, Terry, all! > > again, two news: the good one and the second one. > > Ralph Castain wrote: >> Yes, that would indeed break things. The 1.5 series isn't correctly checking >> connections across mu

Re: [OMPI users] open-mpi error

2011-11-24 Thread Ralph Castain
Hi Markus You have some major problems with confused installations of MPIs. First, you cannot compile an application against MPICH and expect to run it with OMPI - the two are not binary compatible. You need to compile against the MPI installation you intend to run against. Second, your errors

Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh

2011-11-24 Thread Ralph Castain
On Nov 24, 2011, at 2:00 AM, Reuti wrote: > Hi, > > Am 24.11.2011 um 05:26 schrieb Jaison Paul: > >> I am trying to access OpenMPI processes over Internet using ssh and not >> quite successful, yet. I believe that I should be able to do it. >> >> I have to run one process on my PC and the res

Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh

2011-11-25 Thread Ralph Castain
On Nov 25, 2011, at 3:42 AM, Reuti wrote: > Hi Ralph, > > Am 25.11.2011 um 03:47 schrieb Ralph Castain: > >> >> On Nov 24, 2011, at 2:00 AM, Reuti wrote: >> >>> Hi, >>> >>> Am 24.11.2011 um 05:26 schrieb Jaison Paul: >>> &g

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-25 Thread Ralph Castain
On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote: > Hello again, > >>> Ralph Castain wrote: >>>> Yes, that would indeed break things. The 1.5 series isn't correctly >>>> checking connections across multiple interfaces until it finds one that >

Re: [OMPI users] open-mpi error

2011-11-26 Thread Ralph Castain
------- > [linux-6wa6:05565] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > orterun.c at line 543 > > > > What can i do with this? > Thx, > Markus > > On 11/25/2011 03:42 AM, Ralph Castain wrote: >&

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Ralph Castain
ally, 2 processes returned > non-zero exit codes.. Further examination may be required. > --- > [14:38] svbu-mpi:~ % > > (I did not read this thread too carefully, so perhaps I missed an inference > in here somewhere...) > > > > > > On Nov

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Ralph Castain
On Nov 28, 2011, at 5:32 PM, Jeff Squyres wrote: > On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote: > Right-o. Knew there was something I forgot... > >> So on rsh, we do not put envar mca params onto the orted cmd line. This has >> been noted repeatedly on the user a

Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh

2011-11-30 Thread Ralph Castain
If you're interested, see > > http://norbl.com/ppe-ompi.html > > > Barnet Wagman On Nov 30, 2011, at 4:03 AM, Jaison Paul wrote: > > Ralph Castain open-mpi.org> writes: > >> >> >> On Nov 24, 2011, at 2:00 AM, Reuti wrote: >> >

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Ralph Castain
the default value (1000), so our >> application seems to be a pretty extreme case. >> >> T. Rosmond >> >> >> On Mon, 2011-11-14 at 16:17 -0700, Ralph Castain wrote: >>> Yes, this is well documented - may be on the FAQ, but certainly has been in

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Ralph Castain
Oh - and another one at orte/test/mpi/reduce-hang.c On Nov 30, 2011, at 11:50 AM, Ralph Castain wrote: > FWIW: we already have a reproducer from prior work I did chasing this down a > couple of years ago. See orte/test/mpi/bcast_loop.c > > > On Nov 29, 2011, at 9:35 AM, Jef

Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh

2011-11-30 Thread Ralph Castain
PM, Jaison Paul wrote: > Ralph Castain open-mpi.org> writes: > >> >> This has come up before - I would suggest doing a quick search of "ec2" on >> our > user list. Here is one solution: >> On Jun 14, 2011, at 10:50 AM, Barnet Wagman wrote:I&#

Re: [OMPI users] Open MPI and SLURM_CPUS_PER_TASK

2011-11-30 Thread Ralph Castain
Hi Igor As I recall, this eventually traced back to a change in slurm at some point. I believe the latest interpretation is in line with your suggestion. I believe we didn't change it because nobody seemed to care very much, but I have no objection to including it in the next release. Thanks r

Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread Ralph Castain
On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote: > Hello, > > I found the solution, thanks to Qlogic support. > > The "can't open /dev/ipath, network down (err=26)" message from the ipath > driver is really misleading. > > Actually, this is an hardware context problem on the Qlogic PSM. PSM

Re: [OMPI users] wiki and "man mpirun" odds, and a question

2011-12-06 Thread Ralph Castain
That, or at least something like it, is quite doable - will put it on my list. On Dec 6, 2011, at 1:28 PM, Paul Kapinos wrote: > What I asked for is something which could replace > > mpiexec -x FOO -x BAR -x FOBA -x BAFO -x RR -x ZZ .. > > (which is quite tedious to type and error-prone for th

Re: [OMPI users] How are the Open MPI processes spawned?

2011-12-06 Thread Ralph Castain
> case. > See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php > > Thus this issue is not about forwarding some or any OMPI_* envvars to the > _processes_, but on someone step _before_ (the processes were not started > correctly at all in my problem case), as Ralp

<    5   6   7   8   9   10   11   12   13   14   >