[OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
Hi there, I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of the BTL components (they tend to break my single node jobs). ./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran --prefix=$QC_EXT_LIBS/openmpi --enable-static --enable-mca-no-build=btl Building gives me thi

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
Hmm, the build doesn't finish - it breaks when trying to create the man page. I guess I'll disable only a few specific BTL components that have given me issues in the past. Creating ompi_info.1 man page... CCLD ompi_info ../../../ompi/.libs/libmpi.so: undefined reference to `ibv_free_dev

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Gustavo Correa
Hi Lee-Ping Did you cleanup the old build, to start fresh? make distclean configure --disable-vt ... ... I hope this helps, Gus Correa On Sep 29, 2014, at 8:47 AM, Lee-Ping Wang wrote: > Hmm, the build doesn't finish - it breaks when trying to create the man page. > I guess I'll disable on

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Can you pass us the actual mpirun command line being executed? Especially need to see the argv being passed to your application. On Sep 27, 2014, at 7:09 PM, Amos Anderson wrote: > FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. Also, > I have some gdb output (from 1.7

Re: [OMPI users] --prefix, segfaulting

2014-09-29 Thread Ralph Castain
I'm not seeing this with 1.8.3 - can you try with it? On Sep 17, 2014, at 4:38 PM, Ralph Castain wrote: > Yeah, just wanted to make sure you were seeing the same mpiexec in both > cases. There shouldn't be any issue with providing the complete path, though > I can take a look > > > On Sep 1

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Amos Anderson
I'm not calling mpirun in this case because this particular calculation doesn't use more than one processor. What I'm doing on my command line is this: /home/user/myapp/tools/python/bin/python test/regression/regression-test.py test/regression/regression-jobs and internally I check for rank/siz

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Okay, so regression-test.py is calling MPI_Init as a singleton, correct? Just trying to fully understand the scenario Singletons are certainly allowed, if that's the scenario On Sep 29, 2014, at 10:51 AM, Amos Anderson wrote: > I'm not calling mpirun in this case because this particular calcul

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Afraid I cannot replicate a problem with singleton behavior in the 1.8 series: 11:31:52 /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23 OMPI_MCA_orte_default_hostfile=/home/common/hosts OMPI_COMMAND=./hello OMPI_ARGV=f

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Dave Goodell (dgoodell)
Looks like boost::mpi and/or your python "mpi" module might be creating a bogus argv array and passing it to OMPI's MPI_Init routine. Note that argv is required by C99 to be terminated with a NULL pointer (that is, (argv[argc]==NULL) must hold). See http://stackoverflow.com/a/3772826/158513.

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Amos Anderson
Hi Dave -- It looks like my argv[argc] is not NULL (see below), so are we getting that this problem is boost::python's fault? Thanks! Amos. Looking in the boost code, I see this is how MPI_Init is called: environment::environment(int& argc, char** &argv, bool abort_on_exception) : i_initi

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
On Sep 29, 2014, at 12:05 PM, Amos Anderson wrote: > Hi Dave -- > > It looks like my argv[argc] is not NULL (see below), so are we getting that > this problem is boost::python's fault? Yep - they are violating the C99 standard > > Thanks! > Amos. > > > > Looking in the boost code, I see

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
Hi Gus, Thank you. I did start from a completely clean directory tree every time (I deleted the whole folder and re-extracted the tarball). I noticed that disabling any of the BTL components resulted in the same error, so my solution was to build everything and disable certain components at r

[OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Hi there, My application uses MPI to run parallel jobs on a single node, so I have no need of any support for communication between nodes. However, when I use mpirun to launch my application I see strange errors such as: -

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Sorry for my last email - I think I spoke too quick. I realized after reading some more documentation that OpenMPI always uses TCP sockets for out-of-band communication, so it doesn't make sense for me to set OMPI_MCA_oob=^tcp. That said, I am still running into a strange problem in my applica

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Here's another data point that might be useful: The error message is much more rare if I run my application on 4 cores instead of 8. Thanks, - Lee-Ping On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote: > Sorry for my last email - I think I spoke too quick. I realized after > reading some mo

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Ralph Castain
I don't know anything about your application, or what the functions in your code are doing. I imagine it's possible that you are trying to open statically defined ports, which means that running the job again too soon could leave the OS thinking the socket is already busy. It takes awhile for th

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Ralph Castain
ompi_info is just the first time when an executable is built, and so it always is the place where we find missing library issues. It looks like someone has left incorrect configure logic in the system such that we always attempt to build Infiniband-related code, but without linking against the l