Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-25 Thread George Bosilca
Ken, At UTK we focus on developing two generic frameworks for scalable fault tolerant approaches. One is based on uncoordinated checkpoint/restart while the other is application level. 1) uncoordinated C/R based on message logging. Such approaches are fully automatic, rely on an external check

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-25 Thread Ken Lloyd
Thanks. I've read your (Joshua Hersey's) Ph.D. thesis on fault tolerance using checkpointing with much interest. It would be of further interest to get the range of possible user requirements for defining the behaviors in response to various faults. Ken Lloyd On Fri, 2011-04-22 at 15:03 -0400, J

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Joshua Hursey
On Apr 22, 2011, at 1:20 PM, N.M. Maclaren wrote: > On Apr 22 2011, Ralph Castain wrote: > >> Several of us are. Josh and George (plus teammates), and some other outside >> folks, are working the MPI side of it. >> >> I'm working only the ORTE side of the problem. >> >> Quite a bit of capabil

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Graham, Richard L.
The MPI forum is in the process of fefining this - the work going on at ORNL is in this context. Rich - Original Message - From: N.M. Maclaren [mailto:n...@cam.ac.uk] Sent: Friday, April 22, 2011 01:20 PM To: Open MPI Developers Subject: Re: [OMPI devel] Adaptive or fault-tolerant MPI

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread N.M. Maclaren
On Apr 22 2011, Ralph Castain wrote: Several of us are. Josh and George (plus teammates), and some other outside folks, are working the MPI side of it. I'm working only the ORTE side of the problem. Quite a bit of capability is already in the trunk, but there is always more to do :-) Is th

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Ralph Castain
Several of us are. Josh and George (plus teammates), and some other outside folks, are working the MPI side of it. I'm working only the ORTE side of the problem. Quite a bit of capability is already in the trunk, but there is always more to do :-) On Apr 22, 2011, at 9:09 AM, Ken Lloyd wrote:

[OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Ken Lloyd
Before I jump in, is anyone already actively working in this area? Ken