[OMPI devel] 1.7.5 and trunk failures

2014-03-18 Thread Ralph Castain
Just to be safe, I blew away my existing installations and got completely fresh checkouts. I am doing a vanilla configure, with the only configure options besides prefix being --enable-orterun-prefix-by-default and --enable-mpi-java (so I can test the Java bindings) For 1.7.5, running the IBM t

Re: [OMPI devel] MPIEXEC_TIMEOUT broken in v1.7 branch @ r31103

2014-03-18 Thread Jeff Squyres (jsquyres)
This seems to be working, but I think we now have a pid group problem -- I think we need to setpgid() right after the fork. Otherwise, when we kill the group, we might end up killing much more than just the one MPI process (including the orted and/or orted's parent!). Ping me on IM -- I'm test

Re: [OMPI devel] MPIEXEC_TIMEOUT broken in v1.7 branch @ r31103

2014-03-18 Thread Ralph Castain
Okay, fixed and cmr'd to you On Mar 18, 2014, at 11:00 AM, Ralph Castain wrote: > > On Mar 18, 2014, at 10:54 AM, Dave Goodell (dgoodell) > wrote: > >> Ralph, >> >> I'm seeing problems with MPIEXEC_TIMEOUT in v1.7 @ r31103 (fairly close to >> HEAD): >> >> 8< >> MPIEXEC_TIMEOUT=8

Re: [OMPI devel] MPIEXEC_TIMEOUT broken in v1.7 branch @ r31103

2014-03-18 Thread Ralph Castain
On Mar 18, 2014, at 10:54 AM, Dave Goodell (dgoodell) wrote: > Ralph, > > I'm seeing problems with MPIEXEC_TIMEOUT in v1.7 @ r31103 (fairly close to > HEAD): > > 8< > MPIEXEC_TIMEOUT=8 mpirun --mca btl usnic,sm,self -np 4 ./sleeper > --

[OMPI devel] DNS migration of open-mpi.org

2014-03-18 Thread Jeff Squyres (jsquyres)
Tomorrow at 9am US Eastern, IU will be changing the IP address of open-mpi.org (and all of its associated services: email, web, etc.). They're hoping it causes no downtime -- there should be proxies in place to relay traffic from the old IP addresses for the next week or two, so that no one sho

[OMPI devel] MPIEXEC_TIMEOUT broken in v1.7 branch @ r31103

2014-03-18 Thread Dave Goodell (dgoodell)
Ralph, I'm seeing problems with MPIEXEC_TIMEOUT in v1.7 @ r31103 (fairly close to HEAD): 8< MPIEXEC_TIMEOUT=8 mpirun --mca btl usnic,sm,self -np 4 ./sleeper -- The user-provided time limit for job execution has been

Re: [OMPI devel] Hang in comm_spawn

2014-03-18 Thread Ralph Castain
It's on the trunk, but I imagine it is on 1.7 as well. I use the "simple_spawn" program in orte/test/mpi, and the cmd line is just "mpirun -np 2 ./simple_spawn" On Mar 18, 2014, at 7:42 AM, Nathan Hjelm wrote: > Is this trunk or 1.7? Can you give me your mpirun command? > > -Nathan > > On Tu

Re: [OMPI devel] Hang in comm_spawn

2014-03-18 Thread Nathan Hjelm
Is this trunk or 1.7? Can you give me your mpirun command? -Nathan On Tue, Mar 18, 2014 at 07:35:01AM -0700, Ralph Castain wrote: >I'm seeing comm_spawn hang here: >[bend001][[52890,1],0][coll_ml_module.c:3030:mca_coll_ml_comm_query] >COLL-ML ml_coll_schedule_setup exit with error >

[OMPI devel] Hang in comm_spawn

2014-03-18 Thread Ralph Castain
I'm seeing comm_spawn hang here: [bend001][[52890,1],0][coll_ml_module.c:3030:mca_coll_ml_comm_query] COLL-ML ml_coll_schedule_setup exit with error [bend001][[52890,1],1][coll_ml_module.c:3030:mca_coll_ml_comm_query] COLL-ML ml_coll_schedule_setup exit with error Setting -mca coll ^ml allows t

Re: [OMPI devel] usage of mca variables in orte-restart

2014-03-18 Thread Adrian Reber
Thanks for your fix. You say that the environment is only taken in account during register. There is another variable set in the environment in opal-restart.c. Does the following still work: opal-restart.c: (void) mca_base_var_env_name("crs", &tmp_env_var); opal_setenv(tmp_env_var,