Re: [OMPI users] Running program on a cluster

2014-09-24 Thread Ralph Castain
No, it doesn't matter at all for OMPI - any order is fine. The issue I see is that your mpiexec isn't the OMPI one, but is from someone else. I have no idea whose mpiexec you are using On Sep 24, 2014, at 6:38 PM, XingFENG wrote: > I have found the solution. The command mpirun -machinefile ./

Re: [OMPI users] Running program on a cluster

2014-09-25 Thread Ralph Castain
wo mpi are installed, namely, OpenMPI and MPICH2. > > On Thu, Sep 25, 2014 at 11:45 AM, Ralph Castain wrote: > No, it doesn't matter at all for OMPI - any order is fine. The issue I see is > that your mpiexec isn't the OMPI one, but is from someone else. I have no > ide

Re: [OMPI users] Running program on a cluster

2014-09-25 Thread Ralph Castain
ng the MPICH version. On Sep 25, 2014, at 4:33 AM, XingFENG wrote: > It returns /usr/bin/mpiexec. > > On Thu, Sep 25, 2014 at 8:57 PM, Ralph Castain wrote: > Do "which mpiexec" and look at the path. The options you show are from MPICH, > not OMPI. > > On Sep 25,

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Can you pass us the actual mpirun command line being executed? Especially need to see the argv being passed to your application. On Sep 27, 2014, at 7:09 PM, Amos Anderson wrote: > FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. Also, > I have some gdb output (from 1.7

Re: [OMPI users] --prefix, segfaulting

2014-09-29 Thread Ralph Castain
I'm not seeing this with 1.8.3 - can you try with it? On Sep 17, 2014, at 4:38 PM, Ralph Castain wrote: > Yeah, just wanted to make sure you were seeing the same mpiexec in both > cases. There shouldn't be any issue with providing the complete path, though > I can take a

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
s command is executed in the context > of a souped up LD_LIBRARY_PATH. You can see the variable argv in > opal_argv_join is ending up with the last argument on my command line. > > I suppose your question implies that mpirun is mandatory for executing > anything compiled with Ope

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
;ve really tested your scenario. On Sep 29, 2014, at 10:55 AM, Ralph Castain wrote: > Okay, so regression-test.py is calling MPI_Init as a singleton, correct? Just > trying to fully understand the scenario > > Singletons are certainly allowed, if that's the scenario > > O

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
t; > (gdb) print argv[1] > $12 = 0x9caa40 "test/regression/regression-jobs" > (gdb) print argv[2] > $13 = 0x20 > (gdb) > > > > > On Sep 29, 2014, at 11:48 AM, Dave Goodell (dgoodell) > wrote: > >> Looks like boost::mpi and/or your

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Ralph Castain
I don't know anything about your application, or what the functions in your code are doing. I imagine it's possible that you are trying to open statically defined ports, which means that running the job again too soon could leave the OS thinking the socket is already busy. It takes awhile for th

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Ralph Castain
ompi_info is just the first time when an executable is built, and so it always is the place where we find missing library issues. It looks like someone has left incorrect configure logic in the system such that we always attempt to build Infiniband-related code, but without linking against the l

Re: [OMPI users] still SIGSEGV for Java in openmpi-1.9a1r32807 on Solaris

2014-09-30 Thread Ralph Castain
Don't know about the segfault itself, but I did find and fix the classpath logic so the app is found. Might help you get a little further. On Sep 29, 2014, at 10:58 PM, Siegmar Gross wrote: > Hi, > > yesterday I installed openmpi-1.9a1r32807 on my machines with Sun C > 5.12 and gcc-4.9.1. Unf

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Ralph Castain
code knows how to handle arbitrary connections You might check about those warnings - could be that QCLOCALSCR and QCREF need to be set for the code to work. > > - Lee-Ping > > On Sep 29, 2014, at 8:45 PM, Ralph Castain wrote: > >> I don't know anything about your application,

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Ralph Castain
n four different > clusters (where I don't set these environment variables either), it's only > broken on the Blue Waters compute node. Also, the calculation runs without > any problems the first time it's executed on the BW compute node - it's only > subsequent ex

Re: [OMPI users] About valgrind and OpenMPI

2014-10-02 Thread Ralph Castain
HmmmI would guess you should talk to the Hadoop folks as the problem seems to be a conflict between valgrind and HDFS. Does valgrind even support Java programs? I honestly have never tried to do that before. On Oct 2, 2014, at 4:40 AM, XingFENG wrote: > Hi there, > > I am using valgrind

Re: [OMPI users] still SIGSEGV with Java in openmpi-1.9.0a1git99c3999 on Solaris

2014-10-05 Thread Ralph Castain
We've talked about this as lot over the last few weeks, trying to come up with some way to maintain the Solaris support - but have come up empty. None of us have access to such a system, and it appears to be very difficult to avoid regularly breaking it. I may, as time permits, try playing with

Re: [OMPI users] Update/patch to check/opal_check_pmi.m4

2014-10-06 Thread Ralph Castain
I've looked at your patch, and it isn't quite right as it only looks for libpmi and not libpmi2. We need to look for each of them as we could have either or both. I'll poke a bit at this tonight and see if I can make this a little simpler - the nesting is getting a little deep. On Mon, Oct 6, 20

Re: [OMPI users] Update/patch to check/opal_check_pmi.m4

2014-10-07 Thread Ralph Castain
I've poked at this a bit and think I have all the combinations covered - can you try the attached patch? I don't have a way to test it right now, so I don't want to put it in the trunk. Thanks Ralph On Mon, Oct 6, 2014 at 6:02 PM, Ralph Castain wrote: > I've looked at y

Re: [OMPI users] Open MPI was unable to obtain the username

2014-10-10 Thread Ralph Castain
Sorry about delay - was on travel. Yes, that will avoid the issue. On Oct 10, 2014, at 1:17 PM, Gary Jackson wrote: > > To answer my own question: > > Configure with --disable-getpwuid. > > On 10/10/14, 12:04 AM, Gary Jackson wrote: >> >> I'd like to run MPI on a node to which I have access

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-14 Thread Ralph Castain
On Oct 14, 2014, at 5:32 PM, Gus Correa wrote: > Dear Open MPI fans and experts > > This is just a note in case other people run into the same problem. > > I just built Open MPI 1.8.3. > As usual I put my old settings on openmpi-mca-params.conf, > with no further thinking. > Then I compiled th

Re: [OMPI users] Hybrid OpenMPI/OpenMP leading to deadlocks?

2014-10-15 Thread Ralph Castain
If you only have one thread doing MPI calls, then single and funneled are indeed the same. If this is only happening after long run times, I'd suspect resource exhaustion. You might check your memory footprint to see if you are running into leak issues (could be in our library as well as your ap

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-15 Thread Ralph Castain
sire. It was developed in response to requests from researchers who wanted to explore application performance versus placement strategies - but we provided some simplified options to support more common usage patterns. > > > Many thanks, > Gus Correa > > > On 10/15/201

Re: [OMPI users] Open MPI on Cray xc30 and getpwuid

2014-10-16 Thread Ralph Castain
Add --disable-getpwuid to configure On Oct 16, 2014, at 12:36 AM, Aurélien Bouteiller wrote: > I am building trunk on the Cray xc30. > I get the following warning during link (static link) > ../../../orte/.libs/libopen-rte.a(session_dir.o): In function > `orte_session_dir_get_name': > session

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-16 Thread Ralph Castain
That is probably the first place people look for information > about runtime features. > For instance, the process placement examples still > use deprecated parameters and mpiexec options: > -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc. On my to-do list > >

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Ralph Castain
FWIW: vader is the default in 1.8 On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote: > Are you sure you are not using the vader BTL ? > > Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem > initialization info. > > The CMA linux system (that ships with most 3.1x

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Ralph Castain
the benefit of mere mortals like me >> who don't share the dark or the bright side of the force, >> and just need to keep their MPI applications running in production mode, >> hopefully with Open MPI 1.8, >> can somebody explain more clearly what "vader" is a

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-17 Thread Ralph Castain
xec options: > -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc. > > Thank you, > Gus Correa > > On 10/15/2014 11:10 PM, Ralph Castain wrote: >> >> On Oct 15, 2014, at 11:46 AM, Gus Correa > <mailto:g...@ldeo.columbia.edu>> wrote: >&g

Re: [OMPI users] Open MPI 1.8: link problem when Fortran+C+Platform LSF

2014-10-17 Thread Ralph Castain
that point, an undefined symbol reference from another dynamic >library. --no-as-needed restores the default behaviour. > > > > -- > Dipl.-Inform. Paul Kapinos - High Performance Computing, > RWTH Aachen University, IT Center > Seffenter Weg

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-17 Thread Ralph Castain
syntax > and examples. Yeah, I need to do that. LAMA was an alternative implementation of the current map/rank/bind system. It hasn’t been fully maintained since it was introduced, and so I’m not sure how much of it is functional. I need to create an equivalent for the current implementation.

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-17 Thread Ralph Castain
> On Oct 17, 2014, at 12:06 PM, Gus Correa wrote: > > Hi Jeff > > Many thanks for looking into this and filing a bug report at 11:16PM! > > Thanks to Aurelien, Ralph and Nathan for their help and clarifications > also. > > ** > > Related suggestion: > > Add a note to the FAQ explaining that

Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-18 Thread Ralph Castain
> On Oct 17, 2014, at 3:37 AM, Marshall Ward wrote: > > I currently have a numerical model that, for reasons unknown, requires > preconnection to avoid hanging on an initial MPI_Allreduce call. That is indeed odd - it might take a while for all the connections to form, but it shouldn’t hang >

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Ralph Castain
From your error message, I gather you are not running an MPI program, but rather an OSHMEM one? Otherwise, I find the message strange as it only would be emitted from an OSHMEM program. What version of OMPI are you trying to use? > On Oct 22, 2014, at 7:12 PM, Vinson Leung wrote: > > Thanks f

Re: [OMPI users] Problem with Yosemite

2014-10-24 Thread Ralph Castain
I was able to build and run the trunk without problem on Yosemite with: gcc (MacPorts gcc49 4.9.1_0) 4.9.1 GNU Fortran (MacPorts gcc49 4.9.1_0) 4.9.1 Will test 1.8 branch now, though I believe the fortran support in 1.8 is up-to-date > On Oct 24, 2014, at 6:46 AM, Guillaume Houzeaux > wrote:

Re: [OMPI users] Problem with Yosemite

2014-10-24 Thread Ralph Castain
libtool-patches/2014-09/msg2.html > > On Fri, Oct 24, 2014 at 6:09 PM, Ralph Castain wrote: >> I was able to build and run the trunk without problem on Yosemite with: >> >> gcc (MacPorts gcc49 4.9.1_0) 4.9.1 >> GNU Fortran (MacPorts gcc49 4.9.1_0) 4.9.1 >>

Re: [OMPI users] Problem with Yosemite

2014-10-24 Thread Ralph Castain
> > Can you try a 1.8 nightly tarball build on Y? > > > > On Oct 24, 2014, at 12:32 PM, Ralph Castain wrote: > >> Could well be - I’m using the libtool from Apple >> >> Apple Inc. version cctools-855 >> >> Just verified that 1.8 is working

Re: [OMPI users] Problem with Yosemite

2014-10-24 Thread Ralph Castain
Found that you do have to use the Apple version of libtool, however, to build - the Darwin ports “glibtool” version will fail. Tested the 1.8.3 tarball and it again worked fine. > On Oct 24, 2014, at 10:46 AM, Ralph Castain wrote: > > Will do - just taking forever to update my Dar

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Afraid this must be something about the Sparc - just ran on a Solaris 11 x86 box and everything works fine. > On Oct 26, 2014, at 8:22 AM, Siegmar Gross > wrote: > > Hi Gilles, > > I wanted to explore which function is called, when I call MPI_Init > in a C program, because this function shou

Re: [OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
me_t is 32 bits > aligned. If you run an alignment sensitive cpu such as sparc and you are not > lucky (so to speak) you can run into this issue. > i will make a patch for this shortly > > Ralph Castain wrote: >> Afraid this must be something about the Sparc - just ran on a Solaris

Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Oh yeah - that would indeed be very bad :-( > On Oct 26, 2014, at 6:06 PM, Kawashima, Takahiro > wrote: > > Siegmar, Oscar, > > I suspect that the problem is calling mca_base_var_register > without initializing OPAL in JNI_OnLoad. > > ompi/mpi/java/c/mpi_MPI.c: >

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
eight but error prone imho) is to change variable declaration > only. > Any thought ? > > Ralph Castain wrote: >> Will PR#249 solve it? If so, we should just go with it as I suspect that is >> the long-term solution. >> >>> On Oct 26, 2014, at 4:25 P

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Ralph Castain
> On Oct 26, 2014, at 9:56 PM, Brock Palen wrote: > > We are starting to look at supporting MPI on our Hadoop/Spark YARN based > cluster. You poor soul… > I found a bunch of referneces to Hamster, but what I don't find is if it was > ever merged into regular OpenMPI, and if so is it just an

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Ralph Castain
on it (just to keep things clean), push it to my area and give you write access? We can then collaborate on the changes and create a PR from there. This way, you don’t need to give me write access to your entire repo. Make sense? Ralph > > Cheers, > > Gilles > > On 20

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Ralph Castain
> Thanks > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > >> On Oct 27, 2014, at 11:25 AM, Ralph Castain wrote: >> >> >>> On Oct 26, 2014, at 9:56

Re: [OMPI users] OpenMPI 1.8.3 configure fails, Mac OS X 10.9.5, Intel Compilers

2014-10-27 Thread Ralph Castain
FWIW: I just tested with the Intel 15 compilers on Mac 10.10 and it works fine, so apparently the problem has been fixed. You should be able to upgrade to the 15 versions, so that might be the best solution > On Oct 27, 2014, at 11:06 AM, Bosler, Peter Andrew wrote: > > Good morning, > > I’m

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-28 Thread Ralph Castain
> On Oct 27, 2014, at 7:21 PM, Gilles Gouaillardet > wrote: > > Ralph, > > On 2014/10/28 0:46, Ralph Castain wrote: >> Actually, I propose to also remove that issue. Simple enough to use a >> hash_table_32 to handle the jobids, and let that point to a >&

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Ralph Castain
Gilles: will you be committing this to trunk and PR to 1.8? > On Oct 28, 2014, at 11:05 AM, Marco Atzeri wrote: > > On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote: >> Thanks Marco, >> >> pthread_mutex_init calls calloc under cygwin but does not allocate memory >> under linux, so not invokin

Re: [OMPI users] java.lang.ArrayIndexOutOfBoundsException in openmpi-dev-178-ga16c1e4

2014-10-29 Thread Ralph Castain
Looks to me like a buffer isn’t getting initialized to NULL - the message is correct (as is the length), but the rest of the array is random garbage. However, note that MPI messages don’t initialize their buffers for performance reasons. So your program should only be checking the first 6 bytes

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-30 Thread Ralph Castain
>> More about the vader btl can be found here: >> http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/ >> >> -Nathan Hjelm >> HPC-5, LANL >> >> On Fri, Oct 17, 2014 at 01:02:23PM -0700,

Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Ralph Castain
> On Oct 30, 2014, at 3:15 PM, Brock Palen wrote: > > If i'm on the node hosting mpirun for a job, and run: > > orte-ps > > It finds the job and shows the pids and info for all ranks. > > If I use orte-top though it does no such default, I have to find the mpirun > pid and then use it. > >

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-03 Thread Ralph Castain
> On Nov 3, 2014, at 4:54 AM, Mark Dixon wrote: > > Hi there, > > We've started looking at moving to the openmpi 1.8 branch from 1.6 on our > CentOS6/Son of Grid Engine cluster and noticed an unexpected difference when > binding multiple cores to each rank. > > Has openmpi's definition 'slot

Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast

2014-11-03 Thread Ralph Castain
Which version of OMPI were you testing? > On Nov 3, 2014, at 9:14 AM, Steven Eliuk wrote: > > Hello, > > We were using OpenMPI for some testing, everything works fine but randomly, > MPI_Ibcast() > takes long time to finish. We have a standalone program just to test it. The > following > is

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-04 Thread Ralph Castain
the 1.8 series has corrected the situation. > On Nov 3, 2014, at 8:23 AM, Ralph Castain wrote: > > >> On Nov 3, 2014, at 4:54 AM, Mark Dixon wrote: >> >> Hi there, >> >> We've started looking at moving to the openmpi 1.8 branch from 1.6 on ou

Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast

2014-11-06 Thread Ralph Castain
ics, > 1732 North First Street, > San Jose, CA 95112, > Work: +1 408-652-1976, > Work: +1 408-544-5781 Wednesdays, > Cell: +1 408-819-4407. > > > From: Ralph Castain mailto:rhc.open...@gmail.com>> > Reply-To: Open MPI Users mailto:us...@open-mpi.org>> &g

Re: [OMPI users] Question on mapping processes to hosts file

2014-11-07 Thread Ralph Castain
Ah, yes - so here is what is happening. When no slot info is provided, we use the number of detected cores on each node as the #slots. So if you want to loadbalance across the nodes, you need to set —map-by node Or add slots=1 to each line of your host file to override the default behavior > On

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Ralph Castain
OMPI discovers all active interfaces and automatically considers them available for its use unless instructed otherwise via the params. I’d have to look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t have that “on” by default across BTLs, but I don’t know if the TCP one

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Ralph Castain
FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each process receives a complete map of that info for every process in the job. So when the TCP btl sets itself up, it attempts to connect across -all- the interfaces published by the other end. So it doesn’t matter what hos

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-10 Thread Ralph Castain
You might also add the —display-allocation flag to mpirun so we can see what it thinks the allocation looks like. If there are only 16 slots on the node, it seems odd that OMPI would assign 32 procs to it unless it thinks there is only 1 node in the job, and oversubscription is allowed (which it

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Ralph Castain
o World from Node 0. > Hello World from Node 2. > Hello World from Node 3. > Mon Nov 10 13:49:53 CET 2014 > > > -- Reuti > >> FWIW: the use-all-IP interfaces approach has been in OMPI forever. >> >> Sent from my phone. No type good. >>

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Ralph Castain
e the delay issue would be to strace -ttt > both orted and mpi task that are launched on the compute node and see where > the time is lost. > /* at this stage, i would suspect orted ... */ > > Cheers, > > Gilles > > On Mon, Nov 10, 2014 at 5:56 PM, Reuti <ma

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
4000 > user_listsNONE > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args/bin/true > allocation_rule $fill_up > control_slavesTRUE > job_is_first_task FALSE > urgency_slots min > > Many thanks > > Henk > >

Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Ralph Castain
. > > Thanks again, > > Ed > > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Friday, November 07, 2014 11:51 AM > To: Open MPI Users > Subject: EXTERNAL: Re: [OMPI users] Question on mapping processes to hosts > file >

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
> On Nov 11, 2014, at 7:57 AM, Reuti wrote: > > Am 11.11.2014 um 16:13 schrieb Ralph Castain: > >> This clearly displays the problem - if you look at the reported “allocated >> nodes”, you see that we only got one node (cn6050). This is why we mapped >>

Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Ralph Castain
t 8:10 AM, Blosch, Edwin L wrote: > > Thanks Ralph. I’ll experiment with these options. Much appreciated. > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Tuesday, November 11, 2014 10:00 AM > To: Open MPI Users > Subject: Re: [

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
> On Nov 11, 2014, at 10:06 AM, Reuti wrote: > > Am 11.11.2014 um 17:52 schrieb Ralph Castain: > >> >>> On Nov 11, 2014, at 7:57 AM, Reuti wrote: >>> >>> Am 11.11.2014 um 16:13 schrieb Ralph Castain: >>> >>>> This cle

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 7:15 AM, Dave Love wrote: > > Ralph Castain writes: > >> You might also add the —display-allocation flag to mpirun so we can >> see what it thinks the allocation looks like. If there are only 16 >> slots on the node, it seems odd that OM

Re: [OMPI users] 1.8.4

2014-11-12 Thread Ralph Castain
I was going to send something out to the list today anyway - will do so now. > On Nov 12, 2014, at 6:58 AM, Jeff Squyres (jsquyres) > wrote: > > On Nov 12, 2014, at 9:53 AM, Ray Sheppard wrote: > >> Thanks, and sorry to blast my little note out to the list. I guess your >> mail address is

[OMPI users] 1.8.4 release delayed

2014-11-12 Thread Ralph Castain
Hi folks Those of you following the mailing lists probably know that we had hoped to release 1.8.4 last Friday, but were unable to do so. We currently have a couple of issues pending resolution, and our developers are badly “crunched” by final prep for Supercomputing. We then will hit the US Th

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 2:45 PM, Reuti wrote: > > Am 12.11.2014 um 17:27 schrieb Reuti: > >> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >> >>> Another thing you can do is (a) ensure you built with —enable-debug, and >>> then (b) run it wi

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Ralph Castain
4 schrieb Ralph Castain: > >>> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >>> >>> Am 12.11.2014 um 17:27 schrieb Reuti: >>> >>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>>> >>>>> Another thing you

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Ralph Castain
> On Nov 13, 2014, at 9:20 AM, Reuti wrote: > > Am 13.11.2014 um 17:14 schrieb Ralph Castain: > >> Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to >> assign different hostnames to their interfaces - I’ve seen it in the Hadoop >> world,

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-14 Thread Ralph Castain
> On Nov 13, 2014, at 3:36 PM, Dave Love wrote: > > Ralph Castain writes: > >>>>> cn6050 16 par6.q@cn6050 >>>>> cn6045 16 par6.q@cn6045 >>> >>> The above looks like the PE_HOSTFILE. So it should be 16 slots per node. >>

Re: [OMPI users] error building openmpi-dev-274-g2177f9e with Sun C 5.12

2014-11-14 Thread Ralph Castain
FWIW: I just committed the fix to master > On Nov 14, 2014, at 9:24 AM, Jeff Squyres (jsquyres) > wrote: > > Todd K. just reported the same thing: > https://github.com/open-mpi/ompi/issues/272 > > Siegmar: do you have a github ID? If so, we can effectively "CC" you on > these kinds of tick

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-17 Thread Ralph Castain
FWIW: I don't have access to a Linux box right now, but I built the OMPI devel master on my Mac using Intel 2015 compilers and was able to build/run all of the Fortran examples in our "examples" directory. I suspect the problem here is your use of the --enable-mpi-thread-multiple option. The 1.8 s

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-17 Thread Ralph Castain
Just checked the head of the 1.8 branch (soon to be released as 1.8.4), and confirmed the same results. I know the thread-multiple option is still broken there, but will test that once we get the final fix committed. On Mon, Nov 17, 2014 at 7:29 PM, Ralph Castain wrote: > FWIW: I don

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Ralph Castain
nMPI, but there remains a mystery as > to why I only get the segfault error messages on lower node counts. > > mpif90 -O0 -fopenmp ./fred.f90 > > mpiexec -n 6 ./a.out > ------ > mpiexec noticed that process rank

Re: [OMPI users] job running out of memory

2014-11-18 Thread Ralph Castain
Unfortunately, there is no way to share memory across nodes. Running out of memory as you describe can be due to several factors, including most typically: * a memory leak in the application, or the application simply growing too big for the environment * one rank running slow, causing it to buil

Re: [OMPI users] job running out of memory

2014-11-20 Thread Ralph Castain
no other way." > —John Holt > > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Tuesday, November 18, 2014 5:56 PM > To: Open MPI Users > Subject: Re: [OMPI users] job running out of memory > > Unfortunately, there is no

Re: [OMPI users] OpenMPI 1.6.5 & 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-11-25 Thread Ralph Castain
Might be worth trying 1.8.3 to see if it works - there is an updated version of ROMIO in it. > On Nov 25, 2014, at 12:13 PM, Eric Chamberland > wrote: > > Hi, > > I have random segmentation violations (signal 11) in the mentioned function > when testing MPI I/O calls with 2 processes on a si

Re: [OMPI users] How to find MPI ranks located in remote nodes?

2014-11-25 Thread Ralph Castain
Every process has a complete map of where every process in the job is located - not sure if there is an MPI API for accessing it, though. > On Nov 25, 2014, at 2:32 PM, Teranishi, Keita wrote: > > Hi, > > I am trying to figure out a way for each local MPI rank to identify the > ranks locate

Re: [OMPI users] [EXTERNAL] Re: How to find MPI ranks located in remote nodes?

2014-11-25 Thread Ralph Castain
hat I could send you without all of the SCR related > code. > -Adam > > > From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] > on behalf of Ralph Castain [r...@open-mpi.org <mailto:r...@open-mpi.org>] > Sent: Tuesday, November 25,

Re: [OMPI users] "default-only MCA variable"?

2014-12-01 Thread Ralph Castain
> On Nov 28, 2014, at 10:08 AM, Dave Love wrote: > > Gilles Gouaillardet writes: > >> It could be because configure did not find the knem headers and hence knem >> is not supported and hence this mca parameter is read-only > > Yes, in that case (though knem was meant to be used and it's anno

Re: [OMPI users] OpenMPI with blcr problem

2014-12-01 Thread Ralph Castain
u...@open-mpi.org> > > You can reach the person managing the list at > users-ow...@open-mpi.org <mailto:users-ow...@open-mpi.org> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > &

Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-05 Thread Ralph Castain
I’m trying to grok the problem, so bear with me a bit. It sounds like you have a machine with 12 physical cores, each with two hyperthreads, and you have HT turned on - correct? If that is true, then the problem is that you are attempting to bind-to core (of which you have 12), but asking for 2

Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-05 Thread Ralph Castain
We may be getting hung up on terminology, but my guess is that the problem is one of accurately understanding how many cores you have vs ht’s.Can you run lstopo and see what it thinks is there? If you haven’t installed that, you can just run “mpirun -mca ess_base_verbose 10 -n 1 hostname” to ge

Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-08 Thread Ralph Castain
Thanks for sending that lstopo output - helped clarify things for me. I think I now understand the issue. Mostly a problem of my being rather dense when reading your earlier note. Try using —map-by node:PE=N to your cmd line. I think the problem is that we default to —map-by numa if you just gi

Re: [OMPI users] open mpi and MLX

2014-12-09 Thread Ralph Castain
Hi Daniel Yeah, this is a known problem traced to updating ofed to 3.12 - see this thread: http://www.open-mpi.org/community/lists/users/2014/12/25924.php > On Dec 9, 2014, at 7:16 AM, Faraj, Daniel A wrote: > > I am having a

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
Can you provide an example cmd line you use to launch one of these tests using 1.8.3? Some of the options changed between the 1.6 and 1.8 series, and we bind by default in 1.8 - the combination may be causing you a problem. > On Dec 9, 2014, at 9:14 AM, Eric Chamberland > wrote: > > Hi, > >

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
Not for that many procs - we default to binding to socket for anything more than 2 procs > On Dec 9, 2014, at 9:24 AM, Nathan Hjelm wrote: > > > One thing that changed between 1.6 and 1.8 is the default binding > policy. Open MPI 1.6 did not bind by default but 1.8 binds to core. You > can uns

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
> mpirun --output-filename output -mca mpi_show_mca_params all > --report-bindings -np 32 myprog > > between a launch with 165 vs 183. > > The diff may be interesting but I can't interpret everything that is > written... > > The files are attached... > > T

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
ric > > > On 12/09/2014 04:19 PM, Nathan Hjelm wrote: >> >> yield when idle is broken on 1.8. Fixing now. >> >> -Nathan >> >> On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote: >>> Hmmm….well, it looks like we are doing t

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
FWIW: I just committed that patch to the 1.8 repo, so it will be in tomorrow’s nightly 1.8 tarball: http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/> > On Dec 10, 2014, at 7:40 AM, Ralph Castain wrote: > > You should be able to apply the patch - I d

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
I’ll run the tarball generator now so you can try the nightly tarball. > On Dec 10, 2014, at 9:20 AM, Eric Chamberland > wrote: > > On 12/10/2014 10:40 AM, Ralph Castain wrote: >> You should be able to apply the patch - I don’t think that section of >> code differs from

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
Tarball now available on web site http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/> > On Dec 10, 2014, at 9:40 AM, Ralph Castain wrote: > > I’ll run the tarball generator now so you can try the nightly tarball. > >> On Dec 10, 2014, at 9

Re: [OMPI users] MPI_THREAD_MULTIPLE hang

2014-12-10 Thread Ralph Castain
Yes - it is being fixed for 1.8.4 > On Dec 10, 2014, at 2:00 PM, Christopher O'Grady > wrote: > > > Hi, > > I see what looks like a bug in openmpi involving the > MPI_THREAD_MULTIPLE. When we call MPI_Init_thread with this argument, > this 61-line example hangs: > > http://www.slac.stanford

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-11 Thread Ralph Castain
we know this will otherwise kill performance. So you can use the MCA param to force us to “not yield” in that scenario - otherwise, we will always protect you. HTH Ralph > On Dec 10, 2014, at 11:18 AM, Eric Chamberland > wrote: > >> On 12/10/2014 12:55 PM, Ralph Castain wro

Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-15 Thread Ralph Castain
Sorry, I should have been clearer - that was indeed what I was expecting to see. I guess it begs the question - should we just update to something like 1.9 so Brice doesn't have to worry about back porting future fixes this far back? On Mon, Dec 15, 2014 at 7:22 AM, Jeff Squyres (jsquyres) wrot

Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-15 Thread Ralph Castain
upgrading from hwloc 1.7 to 1.9. > > Brice > > > > > > > > > > On Dec 15, 2014, at 10:35 AM, Ralph Castain wrote: > > > >> Sorry, I should have been clearer - that was indeed what I was > expecting to see. I guess it begs the question - should

Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-15 Thread Ralph Castain
idt wrote: >>>>>>>>> >>>>>>>>> Gilles, >>>>>>>>> >>>>>>>>> Ok, very nice! >>>>>>>>> >>>>>>>>> When I excute >>>>>>>>&

Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-17 Thread Ralph Castain
gt; > > > > > 2014-12-15 17:26 GMT-02:00 Alex A. Schmidt : > >> Ralph, >> >> I guess you mean "call mpi_comm_spawn( 'siesta', '< infile' , 2 ,...)" >> >> to execute 'mpirun -n 2 siesta < infile' on the spawn

Re: [OMPI users] OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-17 Thread Ralph Castain
>>> something interpreted by the shell, and when Open MPI "fork-exec" a process >>>> it does not behave as the shell. >>>> >>>> Thus a potentially non-portable solution would be to instead of >>>> launching the mpirun directly to launch it t

Re: [OMPI users] Question on Mapping and Binding

2014-12-22 Thread Ralph Castain
They will be bound to whatever level you specified - I believe by default we bind to socket when mapping by socket. If you want them bound to core, you might need to add —bind-to core. I can take a look at it - I *thought* we had reset that to bind-to core when PE=N was specified, but maybe tha

<    1   2   3   4   5   6   7   8   9   10   >