[OMPI devel] problems with openib finalize

2007-07-18 Thread Jeff Squyres
Background: Pasha added a call in the openib BTL finalize function that will only succeed if all registered memory has been released (ibv_dealloc_pd()). Since the test app didn't call MPI_FREE_MEM, there was some memory that was still registered, and therefore the call in finalize failed.

[OMPI devel] pathscale compilers and TLS

2007-07-18 Thread Jeff Squyres
Crud. The Pathscale 3.0 compilers do not support thread-local data. This is what we've been fighting with https://svn.open-mpi.org/trac/ompi/ ticket/1025; QLogic just told us last week that their compiler does not support TLS (even though OMPI was not currently using it, glibc does, and s

Re: [OMPI devel] Fwd: lsf support / farm use models

2007-07-18 Thread Matthew Moskewicz
hi, first of all, thanks for the info bill! i think i'm really starting to piece things together now. you are right in that i'm working with a 6.x (6.2 with 6.1 devel libs ;) install here at cadence, without the HPC extensions AFAIK. also, i think that are customers are mostly in the same positio

Re: [OMPI devel] devel Digest, Vol 802, Issue 1

2007-07-18 Thread Neil Ludban
Good suggestion, increasing the timeout to somewhere around 12 allowed the job to finish. Initial experimentation showed that I could get a factor of 3-4x improvement in performance using even larger timeouts, matching the times for 64 processes and 1/4 the data set. The cluster is presently havi

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote: > But this will lockup: > > pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep > LD > > The reason is that the hostname in this last command doesn't match the > hostname I get when I query my interfaces, so m

Re: [OMPI devel] MPI_BOTTOM fixes: 1.2.4?

2007-07-18 Thread Rainer Keller
Hi Jeff, just checking the mails with Daniel/George back then. Yes, both would be required as stated in r15129; Should apply cleanly (except for NEWS). Thanks, Rainer On Wednesday 18 July 2007 17:48, Jeff Squyres wrote: > Rainer / George -- > > You guys made some fixes for MPI_BOTTOM et al. rec

Re: [OMPI devel] optional fortran datatype fixes: 1.2.4?

2007-07-18 Thread Rainer Keller
Hi Jeff, r14818 yes --- but there has otherwise not been any requests for this patch... r15137 no, we agreed to put into 1.3 Nevertheless, I posted a CMR for r14818, it does apply cleanly in 1.2-svn. Thanks, Rainer On Wednesday 18 July 2007 17:46, Jeff Squyres wrote: > Rainer -- > > Did you

Re: [OMPI devel] optional fortran datatype fixes: 1.2.4?

2007-07-18 Thread Jeff Squyres
Sorry, I should have included links to the commits in question: https://svn.open-mpi.org/trac/ompi/changeset/14818 https://svn.open-mpi.org/trac/ompi/changeset/15137 On Jul 18, 2007, at 11:46 AM, Jeff Squyres wrote: Rainer -- Did you want to get r14818 and r15137 into 1.2.4? There's no CMR

[OMPI devel] MPI_BOTTOM fixes: 1.2.4?

2007-07-18 Thread Jeff Squyres
Rainer / George -- You guys made some fixes for MPI_BOTTOM et al. recently; did you want them in v1.2.4? There's no CMR. I *think* the changes span the following commits: https://svn.open-mpi.org/trac/ompi/changeset/15129 https://svn.open-mpi.org/trac/ompi/changeset/15030 -- Jeff Squyres

[OMPI devel] optional fortran datatype fixes: 1.2.4?

2007-07-18 Thread Jeff Squyres
Rainer -- Did you want to get r14818 and r15137 into 1.2.4? There's no CMR for them. Here's your commit messages: r14818: - The optional Fortran datatypes may not be available Do not initialize them, if not. If initializing them, check for the correct C-equivalent type to copy fro

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
It works for me in both cases, provided I give the fully qualified host name for your first example. In other words, these work: pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host localhost printenv | grep LD [pn1180961.lanl.gov:22021] [0.0] test of print_name OLDPWD=/Users/rhc/openmpi LD_LIBRARY_PA

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 07:48:17AM -0600, Ralph H Castain wrote: > I believe that was fixed in r15405 - are you at that rev level? I am on the latest revision. > > > On 7/18/07 7:27 AM, "Gleb Natapov" wrote: > > > Hi, > > > > With current trunk LD_LIBRARY_PATH is not set for ranks that are

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
I believe that was fixed in r15405 - are you at that rev level? On 7/18/07 7:27 AM, "Gleb Natapov" wrote: > Hi, > > With current trunk LD_LIBRARY_PATH is not set for ranks that are > launched on the head node. This worked previously. > > -- > Gleb. >

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 04:27:15PM +0300, Gleb Natapov wrote: > Hi, > > With current trunk LD_LIBRARY_PATH is not set for ranks that are > launched on the head node. This worked previously. > Same more info. I use rsh pls. elfit1# /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1 env | grep LD_LI

[OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
Hi, With current trunk LD_LIBRARY_PATH is not set for ranks that are launched on the head node. This worked previously. -- Gleb.

Re: [OMPI devel] iof / oob issues

2007-07-18 Thread Ralph H Castain
Just to further clarify the clarification... ;-) This condition has existed for the last several months. The root problem dates at least back into the 1.1 series. We chased the problem down to the iof_flush call in the odls when a process terminates in something like Jan or Feb this year, at which

Re: [OMPI devel] iof / oob issues

2007-07-18 Thread Jeff Squyres
BTW, the fix didn't occur over the weekend because of some merging issues. I also didn't explain the problem well; you may see some clipped output from your program or the orted may hang while everything is shutting down. This is especially likely to occur for very short applications.