Re: [OMPI devel] BTL add procs errors

2010-06-04 Thread Jeff Squyres
A clarification -- this specific issue is during add_procs(), which, for jobs that do not use the MPI-2 dynamics, is during MPI_INIT. The #3 error detection/abort is not during the dynamic/lazy MPI peer connection wireup. The potential for long timeout delays mentioned in #1 would be during the

Re: [OMPI devel] BTL add procs errors

2010-06-04 Thread Ralph Castain
I think Rolf's reply makes a possibly bad assumption - i.e., that this problem is occurring just as the job is starting to run. Let me give you a real-life example where this wasn't true, and where aborting the job would make a very unhappy user: We start a long-running job (i.e., days) on a ve

Re: [OMPI devel] BTL add procs errors

2010-06-04 Thread Rolf vandeVaart
On 06/04/10 11:47, Jeff Squyres wrote: On Jun 2, 2010, at 1:36 PM, Jeff Squyres (jsquyres) wrote: We did assume that at least the errors are symmetric, i.e. if A fails to connect to B then B will fail when trying to connect to A. However, if there are other BTL the connection is supposed t

Re: [OMPI devel] BTL add procs errors

2010-06-04 Thread Jeff Squyres
On Jun 2, 2010, at 1:36 PM, Jeff Squyres (jsquyres) wrote: > > We did assume that at least the errors are symmetric, i.e. if A fails to > > connect to B then B will fail when trying to connect to A. However, if > > there are other BTL the connection is supposed to smoothly move over some > > ot

Re: [OMPI devel] Migrate OpenMPI to the VxWorks

2010-06-04 Thread Ralph Castain
Jeff is correct - create an orte/odls/vxworks and do whatever you need for that platform to launch a local child process. I believe you will also find calls to fork/exec in the orte/mca/ess/singleton area. You may want to add a configure.m4 to that component to tell it not to build for vxworks.

Re: [OMPI devel] Migrate OpenMPI to the VxWorks

2010-06-04 Thread Jeff Squyres
Maybe gettimeofday() be replaced with opal_gettimeofday(), which could do the Right Thing on different platforms...? Also, for fork/exec, I think that should be mostly limited to orte/odls/default, right? If so, perhaps the right thing to do is to clone that plugin and adapt it for you platfor

Re: [OMPI devel] Migrate OpenMPI to the VxWorks

2010-06-04 Thread 张晶
Hi Castain , Your last mail to me is really helpful . I met most of the issues listed and fixed them as the off-list solution or mine . Also as the openmpi code changed there are some other issues (almost the missing function ) that are not reported .For example , the gettimeofday posix function i