Re: [OMPI users] Question on run-time error "ORTE was unable to reliably start"

2016-07-28 Thread Ralph Castain
What kind of system was this on? ssh, slurm, ...? > On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L wrote: > > I am running cases that are starting just fine and running for a few hours, > then they die with a message that seems like a startup type of failure. > Message shown below. The messag

Re: [OMPI users] Rank specific argument to mpirun

2016-07-29 Thread Ralph Castain
Actually, what Saliya describes sounds like a bug - those procs must all be assigned to the same comm_world. Saliya: are you sure they are not? What ranks are you seeing? > On Jul 29, 2016, at 12:12 PM, Udayanga Wickramasinghe > wrote: > > Hi, > I think orte/ompi-mca foward number of environ

Re: [OMPI users] Rank specific argument to mpirun

2016-07-29 Thread Ralph Castain
at 12:18 PM, Ralph Castain wrote: > > Actually, what Saliya describes sounds like a bug - those procs must all be > assigned to the same comm_world. > > Saliya: are you sure they are not? What ranks are you seeing? > > >> On Jul 29, 2016, at 12:12 PM, Udayanga

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Ralph Castain
Typical practice would be to put a ./myprogram in there to avoid any possible confusion with a “myprogram” sitting in your $PATH. We should search the PATH to find your executable, but the issue might be that it isn’t your PATH on a remote node. So the question is: are you launching strictly lo

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Ralph Castain
> David Schneider > SLAC/LCLS > > From: users [users-boun...@lists.open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: Friday, July 29, 2016 5:19 PM > To: Open MPI Users > Subject: Re: [OMPI users] mpirun won't find programs from the PATH > environ

Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-07-29 Thread Ralph Castain
rs-boun...@lists.open-mpi.org] On Behalf Of Ralph > Castain > Sent: Thursday, July 28, 2016 4:07 PM > To: Open MPI Users > Subject: EXTERNAL: Re: [OMPI users] Question on run-time error "ORTE was > unable to reliably start" > > What kind of system was this on? s

Re: [OMPI users] usNIC point-to-point messaging module

2014-03-23 Thread Ralph Castain
Hmmm...we'll have to check the configure logic as I don't think you should be getting that message. Regardless, it isn't something of concern - you can turn it "off" by adding -mca btl ^usnic on your command line, or configuring OMPI --enable-mca-no-build=btl-usnic On Mar 22, 2014, at 10:00 P

Re: [OMPI users] delays in Isend

2014-03-24 Thread Ralph Castain
I suspect the root cause of the problem here lies in how MPI messages are progressed. OMPI doesn't have an async progress method (yet), and so messaging on both send and recv ends is only progressed when the app calls the MPI library. It sounds like your app issues an isend or recv, and then spe

Re: [OMPI users] another corner case hangup in openmpi-1.7.5rc3

2014-03-24 Thread Ralph Castain
The "updated"field in the orte_job_t structure is only used to help reduce the size of the launch message sent to all the daemons. Basically, we only include info on jobs that have been changed - thus, it only gets used when the app calls comm_spawn. After every launch, we automatically change i

Re: [OMPI users] cleanup of round robin mappers

2014-03-24 Thread Ralph Castain
Looks good - thanks! On Mar 24, 2014, at 4:55 AM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, > > I tried to improve checking for mapping-too-low and fixed a minor > problem in rmaps_rr.c file. Please see attached patch file. > > 1) Regarding mapping-too-low, in future we'll have a lager s

Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

2014-03-27 Thread Ralph Castain
Or use --display-map to see the process to node assignments Sent from my iPhone > On Mar 27, 2014, at 11:47 AM, Gus Correa wrote: > > PS - The (OMPI 1.6.5) mpiexec default is -bind-to-none, > in which case -report-bindings won't report anything. > > So, if you are using the default, > you can

Re: [OMPI users] How to replace --cpus-per-proc by --map-by

2014-03-27 Thread Ralph Castain
Agreed - Jeff and I discussed this just this morning. I will be updating FAQ soon Sent from my iPhone > On Mar 27, 2014, at 9:24 AM, Gus Correa wrote: > > <\begin hijacking this thread> > > I second Saliya's thanks to Tetsuya. > I've been following this thread, to learn a bit more about > how

Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

2014-03-27 Thread Ralph Castain
Oooh...it's Jeff's fault! Fwiw you can get even more detailed mapping info with --display-devel-map Sent from my iPhone > On Mar 27, 2014, at 2:58 PM, "Jeff Squyres (jsquyres)" > wrote: > >> On Mar 27, 2014, at 4:06 PM, "Sasso, John (GE Power & Water, Non-GE)" >> wrote: >> >> Yes, I no

Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

2014-03-27 Thread Ralph Castain
Yes, that is correct Ralph On Thu, Mar 27, 2014 at 4:15 PM, Gus Correa wrote: > On 03/27/2014 05:58 PM, Jeff Squyres (jsquyres) wrote: > >> On Mar 27, 2014, at 4:06 PM, "Sasso, John (GE Power & Water, Non-GE)" >> > wrote: > >> >> Yes, I noticed that I could not find --display-map in any of t

Re: [OMPI users] How to replace --cpus-per-proc by --map-by

2014-03-28 Thread Ralph Castain
You make a good point, Gus - let me throw the thread open for suggestions on how to resolve that problem. We've heard similar concerns raised about other features we've added to OMPI over the years, but I'm not sure of the best way to communicate such information. Do we need a better web page,

Re: [OMPI users] Fortran MPI module and gfortran

2014-03-30 Thread Ralph Castain
Unfortunately, Jeff just went on vacation for a week, so we won't be able to address this right away. I know he spent a bunch of time making sure everything worked okay with gfortran, so I expect there is something odd in the setup - but I'm afraid I don't know all the details On Mar 30, 2014,

Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-01 Thread Ralph Castain
Hmmm...indeed, it looks like the default versions may be out-of-date. Here is a table showing the required rev levels: http://www.open-mpi.org/svn/building.php On Apr 1, 2014, at 8:26 AM, Blosch, Edwin L wrote: > I am getting some errors building 1.8 on RHEL6. I tried autoreconf as > sugges

Re: [OMPI users] usNIC point-to-point messaging module

2014-04-02 Thread Ralph Castain
Yeah, it's a change we added to resolve a problem when Slurm is configured with TaskAffinity set. It's harmless, but annoying - I'm trying to figure out a solution. On Wed, Apr 2, 2014 at 11:35 AM, Dave Goodell (dgoodell) wrote: > On Apr 2, 2014, at 12:57 PM, Filippo Spiga > wrote: > > > I st

Re: [OMPI users] openmpi query

2014-04-03 Thread Ralph Castain
I'm having trouble understanding your note, so perhaps I am getting this wrong. Let's see if I can figure out what you said: * your perl command fails with "no route to host" - but I don't see any host in your cmd. Maybe I'm just missing something. * you tried running a couple of "mpirun", but

Re: [OMPI users] question

2014-04-03 Thread Ralph Castain
You haven't provided enough information to even guess at the issue - please see my response to your last post on this question On Apr 3, 2014, at 9:53 AM, Nisha Dhankher -M.Tech(CSE) wrote: > hoe btl_tcp_endpoint.c error 113 can be solved while executing mpiblast on > rocks 6.0 which uses ope

Re: [OMPI users] Contributing Examples for Java Binding

2014-04-03 Thread Ralph Castain
We'd be happy to add them both to our examples section, and to our regression test area, if okay with you. Feel free to send them to me offlist. Thanks! Ralph On Apr 3, 2014, at 1:44 PM, Saliya Ekanayake wrote: > Hi, > > I've been working on some applications in our group where I've been usin

Re: [OMPI users] openmpi query

2014-04-03 Thread Ralph Castain
2 hours lapsed.on rocks 6.0 cluster with 12 > virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to each > > > > On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain wrote: > I'm having trouble understanding your note, so perhaps I am getting this >

Re: [OMPI users] openmpi query

2014-04-03 Thread Ralph Castain
rent > compute nodes after partitioning od database. > And sir have you done mpiblast ? Nope - but that isn't the issue, is it? The issue is with the MPI setup. > > > On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain wrote: > What is "mpiformatdb"? We don't have

Re: [OMPI users] usNIC point-to-point messaging module

2014-04-04 Thread Ralph Castain
Fixed in r31308 and scheduled for inclusion in 1.8.1 Thanks Ralph On Apr 2, 2014, at 12:17 PM, Ralph Castain wrote: > Yeah, it's a change we added to resolve a problem when Slurm is configured > with TaskAffinity set. It's harmless, but annoying - I'm trying to fig

Re: [OMPI users] openmpi query

2014-04-04 Thread Ralph Castain
cks itself installed,configured openmpi and mpich on it > own through hpc roll. > > > On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain wrote: > > On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) > wrote: > >> thankyou Ralph. >> Yes cluster is heterogenous..

Re: [OMPI users] openmpi query

2014-04-04 Thread Ralph Castain
On Apr 4, 2014, at 7:39 AM, Reuti wrote: > Am 04.04.2014 um 05:55 schrieb Ralph Castain: > >> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) >> wrote: >> >>> thankyou Ralph. >>> Yes cluster is heterogenous... >> >> And did

Re: [OMPI users] openmpi query

2014-04-04 Thread Ralph Castain
Okay, so if you run mpiBlast on all the non-name nodes, everything is okay? What do you mean by "names nodes"? On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) wrote: > no it does not happen on names nodes > > > On Fri, Apr 4, 2014 at 7:51 PM, Ralph Cast

Re: [OMPI users] Call stack upon MPI routine error

2014-04-04 Thread Ralph Castain
Running out of file descriptors sounds likely here - if you have 20 procs/node, and fully connect, each node will see 20*220 connections (you don't use tcp between procs on the same node), with each connection requiring a file descriptor. On Apr 4, 2014, at 11:26 AM, Vince Grimes wrote: > De

Re: [OMPI users] Waitall never returns

2014-04-04 Thread Ralph Castain
It sounds like you don't have a balance between sends and recvs somewhere - i.e., some apps send messages, but the intended recipient isn't issuing a recv and waiting until the message has been received before exiting. If the recipient leaves before the isend completes, then the isend will never

Re: [OMPI users] Open Mpi execution error

2014-04-07 Thread Ralph Castain
What version of OMPI are you attempting to install? Also, using /usr/local as your prefix is a VERY, VERY BAD idea. Most OS distributions come with a (typically old) version of OMPI installed in the system area. Overlaying that with another version can easily lead to the errors you show. You s

Re: [OMPI users] Openmpi 1.8 "rmaps seq" doesn't work

2014-04-07 Thread Ralph Castain
Looks like bit-rot has struck the sequential mapper support - I'll revive it for 1.8.1 On Apr 6, 2014, at 7:17 PM, Chen Bill wrote: > Hi , > > I just tried the openmpi 1.8, but I found the feature --mca rmaps seq doesn't > work. > > for example, > > >mpirun -np 4 -hostfile hostsfle --mca rm

Re: [OMPI users] Open Mpi execution error

2014-04-07 Thread Ralph Castain
Nope - make uninstall will not clean everything out, which is one reason we don't recommend putting things in a system directory On Apr 6, 2014, at 8:44 AM, Kamal wrote: > Hi Hamid, > > So I can uninstall just by typing > > ' make uninstall ' right ? > > what does ' make -j2 ' do ? > > T

Re: [OMPI users] Open Mpi execution error

2014-04-07 Thread Ralph Castain
tion/lib:$LD_LIBRARY_PATH > > best of luck. > > > On Sun, Apr 6, 2014 at 5:45 PM, Kamal wrote: > Hi Ralph, > > I use OMPI 1.8 for Macbook OS X mavericks. > > As you said I will create a new directory to install my MPI files. > > Thanks for your reply, >

Re: [OMPI users] Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Ralph Castain
Looks to me like the problem is here: /bin/.: Permission denied. Appears you don't have permission to exec bash?? On Apr 7, 2014, at 1:04 PM, Blosch, Edwin L wrote: > I am submitting a job for execution under SGE. My default shell is /bin/csh. > The script that is submitted has #!/bin/bash

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Ralph Castain
I doubt that the rsh launcher is getting confused by the cmd you show below. However, if that command is embedded in a script that changes the shell away from your default shell, then yes - it might get confused. When the rsh launcher spawns your remote orted, it attempts to set some envars to e

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Ralph Castain
hell as bash. > > But telling it to check the remote shell did the trick. > > Thanks > > > -Original Message- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Monday, April 07, 2014 4:12 PM > To: Open MPI Users > Subject: Re: [OMPI

Re: [OMPI users] Simple Question regarding MPI Scatterv

2014-04-08 Thread Ralph Castain
I suspect it all depends on when you start the clock. If the data is sitting in the file at time=0, then the file I/O method will likely be faster as every proc just reads its data in parallel - no comm required as it is all handled by the parallel file system. I confess I don't quite understan

Re: [OMPI users] Simple Question regarding MPI Scatterv

2014-04-08 Thread Ralph Castain
that can be heavily optimized with pre-fetch and memory caching. > > > On Tue, Apr 8, 2014 at 4:45 PM, Ralph Castain wrote: > I suspect it all depends on when you start the clock. If the data is sitting > in the file at time=0, then the file I/O method will likely be faster as

Re: [OMPI users] How to set a process on a host but not bound to any core

2014-04-09 Thread Ralph Castain
What version of OMPI are you using? We have a "seq" mapper that does what you want, but the precise cmd line option for directing to use it depends a bit on the version. On Apr 9, 2014, at 9:22 AM, Gan, Qi PW wrote: > Hi, > > I have a problem when setting the processes of a parallel job wi

Re: [OMPI users] Extent of Distributed Array Type?

2014-04-10 Thread Ralph Castain
Wow - that's an ancient one. I'll see if it can be applied to 1.8.1. These things don't automatically go across - it requires that someone file a request to move it - and I think this commit came into the trunk after we branched for the 1.7 series. On Apr 9, 2014, at 12:05 PM, Richard Shaw wr

Re: [OMPI users] Performance issue of mpirun/mpi_init

2014-04-10 Thread Ralph Castain
Just to ensure I understand what you are saying: it appears that 1.8 is much faster than 1.6.5 with the default settings, but slower when you set btl=tcp,self? This seems rather strange. I note that the 1.8 value is identical in the two cases, but somehow 1.6.5 went much faster in the latter ca

Re: [OMPI users] Performance issue of mpirun/mpi_init

2014-04-10 Thread Ralph Castain
. What you can do to compensate is add the --novm option to mpirun (or use the "state_novm_select=1" MCA param) which reverts back to the 1.6.5 behavior. On Apr 10, 2014, at 7:00 AM, Ralph Castain wrote: > Just to ensure I understand what you are saying: it appears that 1.8 is much

Re: [OMPI users] How to set a process on a host but not bound to any core

2014-04-10 Thread Ralph Castain
Just add "-mca rmaps seq" to your command line, then. The mapper will take your hostfile (no rankfile) and map each proc sequentially to the listed nodes. You need to list each node once for each proc - something like this: nodeA nodeB nodeB nodeC nodeA nodeC ... would produce your described pa

Re: [OMPI users] Extent of Distributed Array Type?

2014-04-10 Thread Ralph Castain
10, 2014, at 8:02 AM, Richard Shaw wrote: > Okay. Thanks for having a look Ralph! > > For future reference, is there a better process I can go through if I find > bugs like this that makes sure they don't get forgotten? > > Thanks, > Richard > > > On 10 April 2014

Re: [OMPI users] Performance issue of mpirun/mpi_init

2014-04-10 Thread Ralph Castain
On Apr 10, 2014, at 7:58 AM, Victor Vysotskiy wrote: > Dear Ralph, > >> it appears that 1.8 is much faster than 1.6.5 with the default settings, but >> slower when you set btl=tcp,self? > > Precisely. However, with the default settings both versions are much slower > compared to other MPI

Re: [OMPI users] Performance issue of mpirun/mpi_init

2014-04-11 Thread Ralph Castain
I shaved about 30% off the time - the patch is waiting for 1.8.1, but you can try it now (see the ticket for the changeset): https://svn.open-mpi.org/trac/ompi/ticket/4510#comment:1 I've added you to the ticket so you can follow what I'm doing. Getting any further improvement will take a little

Re: [OMPI users] Optimal mapping/binding when threads are used?

2014-04-11 Thread Ralph Castain
Interesting data. Couple of quick points that might help: option B is equivalent to --map-by node --bind-to none. When you bind to every core on the node, we don't bind you at all since "bind to all" is exactly equivalent to "bind to none". So it will definitely run slower as the threads run ac

Re: [OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Ralph Castain
I'm a little confused - the "no_tree_spawn=true" option means that we are *not* using tree spawn, and so mpirun is directly launching each daemon onto its node. Thus, this requires that the host mpirun is on be able to ssh to every other host in the allocation. You can debug the rsh launcher by

Re: [OMPI users] can't run mpi-jobs on remote host

2014-04-11 Thread Ralph Castain
Please see: http://www.open-mpi.org/faq/?category=rsh#ssh-keys short answer: you need to be able to ssh to the remote hosts without a password On Apr 11, 2014, at 1:09 AM, Lubrano Francesco wrote: > Dear MPI users, > I have a problem with open-mpi (version 1.8). > I'm just beginning to undes

Re: [OMPI users] mpirun problem when running on more than three hosts with OpenMPI 1.8

2014-04-11 Thread Ralph Castain
The problem is with the tree-spawn nature of the rsh/ssh launcher. For scalability, mpirun only launches a first "layer" of daemons. Each of those daemons then launches another layer in a tree-like fanout. The default pattern is such that you first notice it when you have four nodes in your allo

Re: [OMPI users] Question on suspending/resuming MPI processes with SIGSTOP

2014-04-11 Thread Ralph Castain
I'm afraid our suspend/resume support only allows the signal to be applied to *all* procs, not selectively to some. For that matter, I'm unaware of any MPI-level API for hitting a proc with a signal - so I'm not sure how you would programmatically have rank0 suspend some other ranks. On Apr 11,

Re: [OMPI users] can't run mpi-jobs on remote host

2014-04-13 Thread Ralph Castain
Hmmm...well, first ensure you configured --enable-debug, and then add "-mca plm_base_verbose 10 --debug-daemons" to your mpirun cmd line. This will tell you what is happening during the launch. On Apr 13, 2014, at 12:31 PM, Lubrano Francesco wrote: > Sorry for my late reply > I tried previou

Re: [OMPI users] can't run mpi-jobs on remote host

2014-04-14 Thread Ralph Castain
I'm confused - how are you building OMPI?? You normally have to do: 1. ./configure --prefix= This is where you would add --enable-debug 2. make clean all install You then run your mpirun command as you've done. On Apr 14, 2014, at 12:52 AM, Lubrano Francesco wrote: > I can't set --en

Re: [OMPI users] mpirun problem when running on more than three hosts with OpenMPI 1.8

2014-04-14 Thread Ralph Castain
On Apr 13, 2014, at 11:42 AM, Allan Wu wrote: > Thanks, Ralph! > > Adding MAC parameter 'plm_rsh_no_tree_spawn' solves the problem. > > If I understand correctly, the first layer of daemons are three nodes, and > when there are more than three nodes the second layer of daemons are spawn. >

Re: [OMPI users] Performance issue of mpirun/mpi_init

2014-04-14 Thread Ralph Castain
I'm still poking around, but would appreciate a little more info to ensure I'm looking in the right places. How many nodes are you running your application across for your verification suite? I suspect it isn't just one :-) On Apr 10, 2014, at 9:19 PM, Ralph Castain wrote: &

Re: [OMPI users] Problem in Open MPI (v1.8) Performance on 10G Ethernet

2014-04-15 Thread Ralph Castain
Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure the problem isn't in your program? Outside of that, you might want to explicitly tell it to --bind-to core just to be sure it does so - it's supposed to do that by default, but might as well be sure. You can check by adding --re

Re: [OMPI users] Where is the error? (MPI program in fortran)

2014-04-15 Thread Ralph Castain
Have you tried using a debugger to look at the resulting core file? It will probably point you right at the problem. Most likely a case of overrunning some array when #temps > 5 On Tue, Apr 15, 2014 at 10:46 AM, Oscar Mojica wrote: > Hello everybody > > I implemented a parallel simulated annea

Re: [OMPI users] Problem in Open MPI (v1.8) Performance on 10G Ethernet

2014-04-16 Thread Ralph Castain
st3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.] > [host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.] > > > > > On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain wrote: > >> Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure

Re: [OMPI users] FW: Performance issue of mpirun/mpi_init

2014-04-16 Thread Ralph Castain
Thanks Victor! Sorry for the problem, but appreciate you bringing it to our attention. Ralph On Wed, Apr 16, 2014 at 5:16 AM, Victor Vysotskiy < victor.vysots...@teokem.lu.se> wrote: > Hi, > > I just will confirm that the issue has been fixed. Specifically, with the > latest OpenMPI v1.8.1a1r3

Re: [OMPI users] OpenMPI with Gemini Interconnect

2014-04-16 Thread Ralph Castain
The Java bindings are written on top of the C bindings, so you'll be able to use those networks just fine from Java :-) On Wed, Apr 16, 2014 at 2:27 PM, Saliya Ekanayake wrote: > Thank you Nathan, this is what I was looking for. I'll try to build > OpenMPI 1.8 and get back to this thread if I

Re: [OMPI users] Conflicts between jobs running on the same node

2014-04-17 Thread Ralph Castain
Unfortunately, each execution of mpirun has no knowledge of where the procs have been placed and bound by another execution of mpirun. So what is happening is that the procs of the two jobs are being bound to the same cores, thus causing contention. If you truly want to run two jobs at the same ti

Re: [OMPI users] Connection timed out on TCP and notify question

2014-04-24 Thread Ralph Castain
Sounds like either a routing problem or a firewall. Are there multiple NICs on these nodes? Looking at the quoted NIC in your error message, is that the correct subnet we should be using? Have you checked to ensure no firewalls exist on that subnet between the nodes? On Apr 24, 2014, at 8:41 A

Re: [OMPI users] OpenMPI 1.8 and PGI compilers

2014-04-25 Thread Ralph Castain
Hmmmwe haven't heard a problem like that, but if you don't have Xeon Phi devices on your machine, one simple workaround would be to add --enable-mca-no-build=btl-scif to your configure line On Apr 25, 2014, at 10:22 AM, Andrus, Brian Contractor wrote: > All, > > I have been unable to com

Re: [OMPI users] Deadlocks and warnings from libevent when using MPI_THREAD_MULTIPLE

2014-04-25 Thread Ralph Castain
We don't fully support THREAD_MULTIPLE, and most definitely not when using IB. We are planning on extending that coverage in the 1.9 series On Apr 25, 2014, at 2:22 PM, Markus Wittmann wrote: > Hi everyone, > > I'm using the current Open MPI 1.8.1 release and observe > non-deterministic deadl

Re: [OMPI users] Deadlocks and warnings from libevent when using MPI_THREAD_MULTIPLE

2014-04-28 Thread Ralph Castain
mmings > Engineering Specialist > Performance Modeling and Analysis Department > Systems Analysis and Simulation Subdivision > Systems Engineering Division > Engineering and Technology Group > The Aerospace Corporation > 571-307-4220 > jeffrey.a.cummi...@aero.org > &g

Re: [OMPI users] Error in openmpi-1.9a1r31561 using gcc or cc

2014-04-30 Thread Ralph Castain
My bad - forgot to remove a stale line of code exposed by the --enable-heterogeneous option. Fixed in r31567 Sorry about that... On Apr 30, 2014, at 8:11 AM, Siegmar Gross wrote: > Hi, > > I tried to install openmpi-1.9a1r31561 on my machines (openSUSE > Linux 12.1 x86_64, Solaris 10 x86_64,

Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

2014-05-02 Thread Ralph Castain
have the CPU bindings shown as >> well >> >> * If using "--report-bindings --bind-to-core" with OpenMPI 1.4.1 then the >> bindings on just the head node are shown. In 1.6.1, full bindings across >> all hosts are shown. (I'd have to read release notes on this...)

Re: [OMPI users] MPI_Barrier hangs on second attempt but only when multiple hosts used.

2014-05-03 Thread Ralph Castain
Hmmm...just testing on my little cluster here on two nodes, it works just fine with 1.8.2: [rhc@bend001 v1.8]$ mpirun -n 2 --map-by node ./a.out In rank 0 and host= bend001 Do Barrier call 1. In rank 0 and host= bend001 Do Barrier call 2. In rank 0 and host= bend001 Do Barrier call 3. In r

Re: [OMPI users] users Digest, Vol 2881, Issue 1

2014-05-06 Thread Ralph Castain
> > Message: 9 > Date: Tue, 6 May 2014 14:50:34 + > From: "Jeff Squyres (jsquyres)" > To: Open MPI Users > Subject: Re: [OMPI users] users Digest, Vol 2879, Issue 1 > Message-ID: > Content-Type: text/plain; charset=&

Re: [OMPI users] users Digest, Vol 2881, Issue 2

2014-05-06 Thread Ralph Castain
.com > > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-

Re: [OMPI users] users Digest, Vol 2881, Issue 4

2014-05-06 Thread Ralph Castain
send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your

Re: [OMPI users] No output when adding host to hostfile

2014-05-09 Thread Ralph Castain
There is a known bug in the 1.8.1 release whereby daemons failing to start on a remote node will cause a silent failure. This has been fixed for the upcoming 1.8.2 release, but you might want to use one of the nightly 1.8.2 snapshots in the interim. Most likely causes: * not finding the requir

Re: [OMPI users] Issue running mpi program

2014-05-10 Thread Ralph Castain
Hmmmthat is indeed odd. What version of OMPI are you using? You might try adding "-mca plm rsh" to your cmd line - this will ensure the launcher isn't trying to use srun under the covers. However, it shouldn't have built the slurm support if you specifically asked us not to do so. On May 7

Re: [OMPI users] No output when adding host to hostfile

2014-05-10 Thread Ralph Castain
ssee(s), and may not be passed on to, or made available for use by any > person other than the addressee(s). Any and every liability resulting from > any electronic transmission is ruled out. > If you are not the intended recipient, please contact the sender by reply > email and dest

Re: [OMPI users] errors while making openmpi

2014-05-14 Thread Ralph Castain
What version are you talking about? On May 13, 2014, at 11:13 PM, Hamed Mortazavi wrote: > Hi all, > > in make check for openmpi on a mac I see following error message, has anybody > ever run to this error? any solutions? > > Best, > > Hamed, > raw extraction in 1 microsec > > Example 3.

Re: [OMPI users] bug in MPI_File_set_view?

2014-05-14 Thread Ralph Castain
You might give it a try with 1.8.1 or the nightly snapshot from 1.8.2 - we updated ROMIO since the 1.6 series, and whatever fix is required may be in the newer version On May 14, 2014, at 6:52 AM, CANELA-XANDRI Oriol wrote: > Hello, > > I am using MPI IO for writing/reading a block cyclic

Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742

2014-05-14 Thread Ralph Castain
What are the interfaces on these machines? On May 14, 2014, at 7:45 AM, Siegmar Gross wrote: > Hi, > > I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10 > Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with > Sun C5.12 and still have the following problem. > > tyr

Re: [OMPI users] bug in MPI_File_set_view?

2014-05-14 Thread Ralph Castain
auber wrote: > Is there an ETA for 1.8.2 general release instead of snapshot? > > Thanks, -- bennet > > On Wed, May 14, 2014 at 10:17 AM, Ralph Castain wrote: >> You might give it a try with 1.8.1 or the nightly snapshot from 1.8.2 - we >> updated ROMIO since the

Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742

2014-05-14 Thread Ralph Castain
Hmmm...well, that's an interesting naming scheme :-) Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line and let's see what it thinks is happening On May 14, 2014, at 9:06 AM, Siegmar Gross wrote: > Hi Ralph, > >> What are the interfaces on these machines? > > tyr fd1026

Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742

2014-05-14 Thread Ralph Castain
Just committed a potential fix to the trunk - please let me know if it worked for you On May 14, 2014, at 11:44 AM, Siegmar Gross wrote: > Hi Ralph, > >> Hmmm...well, that's an interesting naming scheme :-) >> >> Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line >> and le

Re: [OMPI users] Question about scheduler support

2014-05-14 Thread Ralph Castain
FWIW: I believe we no longer build the slurm support by default, though I'd have to check to be sure. The intent is definitely not to do so. The plan we adjusted to a while back was to *only* build support for schedulers upon request. Can't swear that they are all correctly updated, but that was

Re: [OMPI users] Question about scheduler support

2014-05-14 Thread Ralph Castain
t for various schedulers, and so just finding the required headers isn't enough to know that the scheduler is intended for use. So we wind up building a bunch of useless modules. On May 14, 2014, at 3:09 PM, Ralph Castain wrote: > FWIW: I believe we no longer build the slurm suppo

Re: [OMPI users] Question about scheduler support

2014-05-14 Thread Ralph Castain
On May 14, 2014, at 3:21 PM, Jeff Squyres (jsquyres) wrote: > On May 14, 2014, at 6:09 PM, Ralph Castain wrote: > >> FWIW: I believe we no longer build the slurm support by default, though I'd >> have to check to be sure. The intent is definitely not to do so. >

Re: [OMPI users] Question about scheduler support

2014-05-14 Thread Ralph Castain
you don't tell me to not-build" Tough set of compromises as it depends on the target audience. Sys admins prefer the "build only what I say", while users (who frequently aren't that familiar with the inners of a system) prefer the "build all" mentality. On Ma

Re: [OMPI users] Build failure on scientific linux 5.4

2014-05-14 Thread Ralph Castain
Just sniffing around the web, I found that this is a problem caused by newer versions of gcc. One reporter stated that they resolved the problem by adding "-fgnu89-inline" to their configuration: "add the compiler flag "-fgnu89-inline" (because of an issue where old glibc libraries aren't compa

Re: [OMPI users] Question about scheduler support

2014-05-14 Thread Ralph Castain
more than one scheduler. > > Maxime > > Le 2014-05-14 19:09, Ralph Castain a écrit : >> Jeff and I have talked about this and are approaching a compromise. Still >> more thinking to do, perhaps providing new configure options to "only build >> what I ask

Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742

2014-05-15 Thread Ralph Castain
It is an unrelated bug introduced by a different commit - causing mpirun to segfault upon termination. The fact that you got the hostname to run indicates that this original fix works, so at least we know the connection logic is now okay. Thanks Ralph On May 15, 2014, at 3:40 AM, Siegmar Gros

Re: [OMPI users] Question about scheduler support

2014-05-15 Thread Ralph Castain
Hi Gus The issue is that you have to work thru all the various components (leafing thru the code base) to construct a list of all the things you *don't* want to build. By default, we build *everything*, so there is no current method to simply "build only what I want". For those building static

Re: [OMPI users] Using PMI as RTE component

2014-05-15 Thread Ralph Castain
What do you mean "goes through orte component"? It will still call into the orte code base, but will use PMI to do the modex. On May 15, 2014, at 12:54 PM, Hadi Montakhabi wrote: > Hello, > > I am trying to utilize pmi instead of orte, but I come across the following > problem. > I do configu

Re: [OMPI users] How to replace --cpus-per-proc by --map-by

2014-05-15 Thread Ralph Castain
I'm not sure of the issue, but so far as I'm aware the cpus-per-proc functionality continued to work thru all those releases and into today. Yes, the syntax changed during the 1.7 series to reflect a broader desire to consolidate options into something that could be contained in a minimum number

Re: [OMPI users] Using PMI as RTE component

2014-05-15 Thread Ralph Castain
e rte framework, namely orte and pmi. > The question is whether pmi could be used independent from orte? Or it needs > orte to function? > > Peace, > Hadi > > > On Thu, May 15, 2014 at 2:59 PM, Ralph Castain wrote: > What do you mean "goes through orte component&q

Re: [OMPI users] openMPI in 64 bit

2014-05-15 Thread Ralph Castain
This is on a Windows box? If so, I don't know if anyone built/posted a 64-bit release version for Windows (you might check the OMPI site and see if there is something specific for 64-bit), and we don't support Windows directly any more. You might also look at the cygwin site for a downloadable v

Re: [OMPI users] Question about scheduler support

2014-05-15 Thread Ralph Castain
On May 15, 2014, at 2:34 PM, Fabricio Cannini wrote: > Em 15-05-2014 07:29, Jeff Squyres (jsquyres) escreveu: >> I think Ralph's email summed it up pretty well -- we unfortunately have (at >> least) two distinct groups of people who install OMPI: >> >> a) those who know exactly what they want

Re: [OMPI users] Question about scheduler support

2014-05-15 Thread Ralph Castain
On May 15, 2014, at 4:15 PM, Maxime Boissonneault wrote: > Le 2014-05-15 18:27, Jeff Squyres (jsquyres) a écrit : >> On May 15, 2014, at 6:14 PM, Fabricio Cannini wrote: >> >>> Alright, but now I'm curious as to why you decided against it. >>> Could please elaborate on it a bit ? >> OMPI has

Re: [OMPI users] Question about scheduler support

2014-05-15 Thread Ralph Castain
Nobody is disagreeing that one could find a way to make CMake work - all we are saying is that (a) CMake has issues too, just like autotools, and (b) we have yet to see a compelling reason to undertake the transition...which would have to be a *very* compelling one. On May 15, 2014, at 4:45 PM

Re: [OMPI users] Using PMI as RTE component

2014-05-15 Thread Ralph Castain
. > > Josh > > > On Thu, May 15, 2014 at 4:13 PM, Ralph Castain wrote: > I wouldn't trust that PMI component in the RTE framework - it was only > created as a test example for that framework. It is routinely broken and not > maintained, and can only be used i

Re: [OMPI users] ierr vs ierror in F90 mpi module

2014-05-15 Thread Ralph Castain
you might try the nightly 1.8.2 build - there were some additional patches to fix the darned tkr support. I'm afraid getting all the various compilers to work correctly with it has been a major pain. On May 15, 2014, at 5:01 PM, W Spector wrote: > Hi Jeff and the list, > > A year ago, we had

Re: [OMPI users] Using PMI as RTE component

2014-05-15 Thread Ralph Castain
wrote: > Ralph is right. > I used 1.8, and after digging into it, I noticed it doesn't even compile the > pmi component. When I tried to configure without orte, I could see the errors > while compiling. > It looks like it is well broken! > > Peace, > Hadi > >

Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742

2014-05-16 Thread Ralph Castain
Done - will be in nightly 1.8.2 tarball generated later today. On May 16, 2014, at 2:57 AM, Siegmar Gross wrote: > Hi, > >> This bug should be fixed in tonight's tarball, BTW. > ... >>> It is an unrelated bug introduced by a different commit - >>> causing mpirun to segfault upon termination.

Re: [OMPI users] Question about scheduler support

2014-05-16 Thread Ralph Castain
On May 16, 2014, at 1:03 PM, Fabricio Cannini wrote: > Em 16-05-2014 10:06, Jeff Squyres (jsquyres) escreveu: >> On May 15, 2014, at 8:00 PM, Fabricio Cannini >> wrote: >> Nobody is disagreeing that one could find a way to make CMake work - all we are saying is that (a) CMake has iss

  1   2   3   4   5   6   7   8   9   10   >