Argh. I know the problem here - per note on user list, I actually found more
than five months ago that we weren't properly serializing commands in the
system and created a fix for it. I applied that fix only to the comm_spawn
scenario at the time as this was the source of the pain - but I noted
Makes perfect sense.
george.
On Dec 16, 2009, at 13:27 , Jeff Squyres wrote:
> I think I understand you're saying:
>
> - it's ok to abort during MPI_INIT (we can rationalize it as the default
> error handler)
> - we should only abort during MPI functions
>
> Is that right? If so, I agree w
I think I understand you're saying:
- it's ok to abort during MPI_INIT (we can rationalize it as the default error
handler)
- we should only abort during MPI functions
Is that right? If so, I agree with your interpretation. :-) ...with one
addition: it's ok to abort before MPI_INIT, because
Sure. Processors were scaled down while idling to 1000MHz
(I hope this will show up as attachement instead of inlined...)
* on Wednesday, 16.12.09 at 18:12, Lenny Verkhovsky
wrote:
> Hi,
> can you provide $cat /proc/cpuinfo
> I am not optimistic that it will help, but still...
> thanks
> Lenny
Hi,
can you provide $cat /proc/cpuinfo
I am not optimistic that it will help, but still...
thanks
Lenny.
On Wed, Dec 16, 2009 at 6:01 PM, Daan van Rossum wrote:
> Hi Terry,
>
> Thanks for your hint. I tried configure --enable-debug and even compiled it
> with all kind of manual debug flags turned
Hi Terry,
Thanks for your hint. I tried configure --enable-debug and even compiled it
with all kind of manual debug flags turned on, but it doesn't help to get rid
of this problem. So it definitively is not an optimization flaw.
One more interesting test would be to try an older version of the I
There are two citation from the MPI standard that I would like to highlight.
> All MPI programs must contain exactly one call to an MPI initialization
> routine: MPI_INIT or MPI_INIT_THREAD.
> One goal of MPI is to achieve source code portability. By this we mean that a
> program written using
I would tend to agree with Paul.
It's uncommon (e.g., no one has run into this before now), and I would say that
this is a bad application. But then again, hanging is bad -- so it would be
better to abort/terminate the whole job in this scenario.
I don't know how I would rate the priority of t
Currently, I am working on process migration and automatic recovery based on
checkpoint/restart. WRT the PML stack, this works by rewiring the BTLs after
restart of the migrated/recovered MPI process(es). There is a fair amount of
work in getting this right with respect to both the runtime and t
> -Original Message-
> From: devel-boun...@open-mpi.org
> [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
> Sent: Tuesday, December 15, 2009 6:32 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] carto vs. hwloc
>
> On Dec 15, 2009, at 2:20 PM, Ralph Castain wrote
Hello all.
To Jeff:
I thought that if there are no replies it means OK.
Thank you for your comments, I fixed it, you can see the patch below.
Jeff Squyres wrote:
On Dec 15, 2009, at 8:56 PM, Jeff Squyres wrote:
Hmm. I'm a little disappointed that this was applied without answering
As far as I know what Josh did is slightly different. In the case of a complete
restart (where all processes are restarted from a checkpoint), he setup and
rewire a new set of BTLs.
However, it happens that we do have some code to rewire the MPI processes in
case of failure(s) in one of UTK pro
I don't think so. I had a very modest goal, it was not to fix the xgrid PLM
(I'm not that proficient on Objective-C) but to silence the annoying compiler
on my MAC. In fact I didn't even test it to see if its working or not, but
based on some more or less recent complaints on the user mailing li
13 matches
Mail list logo