Re: [OMPI devel] Resilience 2011

2011-06-27 Thread Ralph Castain
On Jun 27, 2011, at 6:57 AM, Ken Lloyd wrote: > One point I've been trying to put forward in my domain is, currently, high > performance computing != high reliability computing. Not by a long shot. > Seems that they are orthogonally coupled. I think that has been true in the past - an emerging

Re: [OMPI devel] Resilience 2011

2011-06-27 Thread Ken Lloyd
One point I've been trying to put forward in my domain is, currently, high performance computing != high reliability computing. Not by a long shot. Seems that they are orthogonally coupled. There are many pieces to this problem-puzzle. Some of these pieces are inter-related. Some of my work has de

Re: [OMPI devel] Resilience 2011

2011-06-27 Thread Josh Hursey
It has been on my to-do list for a while to start a FAQ listing of the various resilience/FT related activities in and around Open MPI. This would provide a starting location for users and new developers could go to for an overview of each of the features, and how to activate/use the feature. I'll

Re: [OMPI devel] Resilience 2011

2011-06-26 Thread Ralph Castain
I think we're some ways away from declaring a "resilient ORTE". Josh and I have been committing pieces of it over the last two years, and Wes just committed another piece the other day that might have been titled "fault tolerant OOB" as it primarily addressed maintaining comm routing during node

[OMPI devel] Resilience 2011

2011-06-24 Thread Ken Lloyd
Josh and Wesley, Will you be presenting Resilient ORTE at Resilience 2011 in Bordeaux? http://xcr.cenit.latech.edu/resilience2011/ = Kenneth A. Lloyd CEO - Director of Systems Science Watt Systems Technologies Inc. www.wattsys.com kenneth.ll...@wattsys.com This e-mail is co