No issue - just trying to get ahead of the game instead of running into an
issue later.
We can leave it for now.
On Jun 10, 2011, at 2:47 PM, Josh Hursey wrote:
> We could, but we could also just replace the callback. I will never
> what to use it in my scenario, and if I did then I could just
We could, but we could also just replace the callback. I will never
what to use it in my scenario, and if I did then I could just call it
directly instead of relying on the errmgr to do the right thing. So
why complicate the errmgr with additional complexity for something
that we don't need at the
So why not have the callback return an int, and your callback returns "go no
further"?
On Jun 10, 2011, at 2:06 PM, Josh Hursey wrote:
> Yeah I do not want the default fatal callback in OMPI. I want to
> replace it with something that allows OMPI to continue running when
> there are process fai
Yeah I do not want the default fatal callback in OMPI. I want to
replace it with something that allows OMPI to continue running when
there are process failures (if the error handlers associated with the
communicators permit such an action). So having the default fatal
callback called after mine wou
Committed in r24772:
https://svn.open-mpi.org/trac/ompi/changeset/24772
Thanks folks,
Josh
On Fri, Jun 10, 2011 at 12:56 PM, Josh Hursey wrote:
> Reminder that this RFC goes in later today.
>
> On Wed, Jun 8, 2011 at 10:32 AM, Jeff Squyres wrote:
>> This one's a no-brainer, folks. :-)
>>
>>
On Jun 10, 2011, at 6:32 AM, Josh Hursey wrote:
> On Fri, Jun 10, 2011 at 7:37 AM, Ralph Castain wrote:
>>
>> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>>
>>>
>>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>>>
Well, you're way to trusty. ;)
>>>
>>> It's the midwestern boy
Reminder that this RFC goes in later today.
On Wed, Jun 8, 2011 at 10:32 AM, Jeff Squyres wrote:
> This one's a no-brainer, folks. :-)
>
> Josh [re]discovered that we didn't initially support Fortran interfaces for
> the extensions when he was trying to make a complete implementation for an
>
On Jun 10, 2011, at 7:01 AM, Josh Hursey wrote:
> On Fri, Jun 10, 2011 at 8:51 AM, Ralph Castain wrote:
>>
>> On Jun 10, 2011, at 6:38 AM, Josh Hursey wrote:
>>
>>> Another problem with this patch, that I mentioned to Wesley and George
>>> off list, is that it does not handle the case when mpi
On Jun 10, 2011, at 5:16 AM, Matthias Jurenz wrote:
> There are different ways to fix the problem:
>
> 1. Apply the attached patch on ltmain.sh.
>
> This patch excludes the target library name from searching *.la libraries.
Does your patch work for vpath builds, too? If so, isn't this somethin
On Fri, Jun 10, 2011 at 8:51 AM, Ralph Castain wrote:
>
> On Jun 10, 2011, at 6:38 AM, Josh Hursey wrote:
>
>> Another problem with this patch, that I mentioned to Wesley and George
>> off list, is that it does not handle the case when mpirun/HNP is also
>> hosting processes that might fail. In my
On Jun 10, 2011, at 6:32 AM, Josh Hursey wrote:
> On Fri, Jun 10, 2011 at 7:37 AM, Ralph Castain wrote:
>>
>> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>>
>>>
>>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>>>
Well, you're way to trusty. ;)
>>>
>>> It's the midwestern boy
On Jun 10, 2011, at 6:48 AM, Josh Hursey wrote:
> Why would this patch result in zombied processes and poor cleanup?
> When ORTE receive notification of a process terminating/aborting then
> it triggers the termination of the job (without UTK's RFC) which
> should ensure a clean shutdown. This pa
On Jun 10, 2011, at 6:38 AM, Josh Hursey wrote:
> Another problem with this patch, that I mentioned to Wesley and George
> off list, is that it does not handle the case when mpirun/HNP is also
> hosting processes that might fail. In my testing of the patch it
> worked fine if mpirun/HNP was -not-
Why would this patch result in zombied processes and poor cleanup?
When ORTE receive notification of a process terminating/aborting then
it triggers the termination of the job (without UTK's RFC) which
should ensure a clean shutdown. This patch just tells ORTE that a few
other processes should be t
Another problem with this patch, that I mentioned to Wesley and George
off list, is that it does not handle the case when mpirun/HNP is also
hosting processes that might fail. In my testing of the patch it
worked fine if mpirun/HNP was -not- hosting any processes, but once it
had to host processes
Okay, finally have time to sit down and review this. It looks pretty much
identical to what was done in ORCM - we just kept "epoch" separate from the
process name, and use multicast to notify all procs that someone failed. I do
have a few questions/comments about your proposed patch:
1. I note
On Fri, Jun 10, 2011 at 7:37 AM, Ralph Castain wrote:
>
> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>
>>
>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>>
>>> Well, you're way to trusty. ;)
>>
>> It's the midwestern boy in me :)
>
> Still need to shake that corn out of your head... :-
It's a Libtool issue (once again) which occurs if a previous build is re-
configured without subsequent "make clean" and the LIBC developer library
"libutil" is added to LIBS.
The error is simple to reproduce by the following steps:
1. configure
2. make -C ompi/contrib/vt/vt/util
3. configure
or
+ attachment
On Friday 10 June 2011 12:00:49 you wrote:
> It's a Libtool issue (once again) which occurs if a previous build is re-
> configured without subsequent "make clean" and the LIBC developer library
> "libutil" is added to LIBS.
>
> The error is simple to reproduce by the following steps
I have no issue with uncommenting the code. However, I do see a future littered
with lots of zombied processes and complaints over poor cleanup again
On Jun 9, 2011, at 6:08 PM, Joshua Hursey wrote:
> Ah I see what you are getting at now.
>
> The construction of the list of connected proce
It's a Libtool issue (once again) which occurs if a previous build is re-
configured without subsequent "make clean" and the LIBC developer library
"libutil" is added to LIBS.
The error is simple to reproduce by the following steps:
1. configure
2. make -C ompi/contrib/vt/vt/util
3. configure
or
Something else you might want to address in here: the current code sends an RML
message from the proc calling abort to its local daemon telling the daemon that
we are exiting due to the app calling "abort". We needed to do this because we
wanted to flag the proc termination as one induced by the
On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>
> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>
>> Well, you're way to trusty. ;)
>
> It's the midwestern boy in me :)
Still need to shake that corn out of your head... :-)
>
>>
>> This only works if all component play the game, and
23 matches
Mail list logo