Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-12-16 Thread Ralph Castain
Argh. I know the problem here - per note on user list, I actually found more than five months ago that we weren't properly serializing commands in the system and created a fix for it. I applied that fix only to the comm_spawn scenario at the time as this was the source of the pain - but I noted

Re: [OMPI devel] Bug or feature?

2009-12-16 Thread George Bosilca
Makes perfect sense. george. On Dec 16, 2009, at 13:27 , Jeff Squyres wrote: > I think I understand you're saying: > > - it's ok to abort during MPI_INIT (we can rationalize it as the default > error handler) > - we should only abort during MPI functions > > Is that right? If so, I agree w

Re: [OMPI devel] Bug or feature?

2009-12-16 Thread Jeff Squyres
I think I understand you're saying: - it's ok to abort during MPI_INIT (we can rationalize it as the default error handler) - we should only abort during MPI functions Is that right? If so, I agree with your interpretation. :-) ...with one addition: it's ok to abort before MPI_INIT, because

Re: [OMPI devel] SEGFAULT in mpi_init from paffinity with intel 11.1.059 compiler

2009-12-16 Thread Daan van Rossum
Sure. Processors were scaled down while idling to 1000MHz (I hope this will show up as attachement instead of inlined...) * on Wednesday, 16.12.09 at 18:12, Lenny Verkhovsky wrote: > Hi, > can you provide $cat /proc/cpuinfo > I am not optimistic that it will help, but still... > thanks > Lenny

Re: [OMPI devel] SEGFAULT in mpi_init from paffinity with intel 11.1.059 compiler

2009-12-16 Thread Lenny Verkhovsky
Hi, can you provide $cat /proc/cpuinfo I am not optimistic that it will help, but still... thanks Lenny. On Wed, Dec 16, 2009 at 6:01 PM, Daan van Rossum wrote: > Hi Terry, > > Thanks for your hint. I tried configure --enable-debug and even compiled it > with all kind of manual debug flags turned

Re: [OMPI devel] SEGFAULT in mpi_init from paffinity with intel 11.1.059 compiler

2009-12-16 Thread Daan van Rossum
Hi Terry, Thanks for your hint. I tried configure --enable-debug and even compiled it with all kind of manual debug flags turned on, but it doesn't help to get rid of this problem. So it definitively is not an optimization flaw. One more interesting test would be to try an older version of the I

Re: [OMPI devel] Bug or feature?

2009-12-16 Thread George Bosilca
There are two citation from the MPI standard that I would like to highlight. > All MPI programs must contain exactly one call to an MPI initialization > routine: MPI_INIT or MPI_INIT_THREAD. > One goal of MPI is to achieve source code portability. By this we mean that a > program written using

Re: [OMPI devel] Bug or feature?

2009-12-16 Thread Jeff Squyres
I would tend to agree with Paul. It's uncommon (e.g., no one has run into this before now), and I would say that this is a bad application. But then again, hanging is bad -- so it would be better to abort/terminate the whole job in this scenario. I don't know how I would rate the priority of t

Re: [OMPI devel] carto vs. hwloc

2009-12-16 Thread Joshua Hursey
Currently, I am working on process migration and automatic recovery based on checkpoint/restart. WRT the PML stack, this works by rewiring the BTLs after restart of the migrated/recovered MPI process(es). There is a fair amount of work in getting this right with respect to both the runtime and t

Re: [OMPI devel] carto vs. hwloc

2009-12-16 Thread Kenneth Lloyd
> -Original Message- > From: devel-boun...@open-mpi.org > [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres > Sent: Tuesday, December 15, 2009 6:32 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] carto vs. hwloc > > On Dec 15, 2009, at 2:20 PM, Ralph Castain wrote

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22313

2009-12-16 Thread Vasily Philipov
Hello all. To Jeff: I thought that if there are no replies it means OK. Thank you for your comments, I fixed it, you can see the patch below. Jeff Squyres wrote: On Dec 15, 2009, at 8:56 PM, Jeff Squyres wrote: Hmm. I'm a little disappointed that this was applied without answering

Re: [OMPI devel] carto vs. hwloc

2009-12-16 Thread George Bosilca
As far as I know what Josh did is slightly different. In the case of a complete restart (where all processes are restarted from a checkpoint), he setup and rewire a new set of BTLs. However, it happens that we do have some code to rewire the MPI processes in case of failure(s) in one of UTK pro

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22317

2009-12-16 Thread George Bosilca
I don't think so. I had a very modest goal, it was not to fix the xgrid PLM (I'm not that proficient on Objective-C) but to silence the annoying compiler on my MAC. In fact I didn't even test it to see if its working or not, but based on some more or less recent complaints on the user mailing li