[OMPI users] Related to project ideas in OpenMPI

2011-08-24 Thread srinivas kundaram
I am final year grad student looking for my final year project in OpenMPI.We are group of 4 students. I wanted to know about the "Process Migration" process of MPI processes in OpenMPI. Can anyone suggest me any ideas for project related to process migration in OenMPI or other topics in Systems.

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-24 Thread Jeff Squyres
Be aware that process migration is a pretty complex issue. Josh is probably the best one to answer your question directly, but he's out today. On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote: > I am final year grad student looking for my final year project in OpenMPI.We > are group of 4

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Ralph Castain
It also depends on what part of migration interests you - are you wanting to look at the MPI part of the problem (reconnecting MPI transports, ensuring messages are not lost, etc.) or the RTE part of the problem (where to restart processes, detecting failures, etc.)? On Aug 24, 2011, at 7:04 A

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Srinivas, There's also Kernel-Level Checkpointing vs. User-Level Checkpointing - if you can checkpoint an MPI task and restart it on a new node, then this is also "process migration". Of course, doing a checkpoint & restart can be slower than pure in-kernel process migration, but the advantage is

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Durga Choudhury
Is anything done at the kernel level portable (e.g. to Windows)? It *can* be, in principle at least (by putting appropriate #ifdef's in the code), but I am wondering if it is in reality. Also, in 2005 there was an attempt to implement SSI (Single System Image) functionality to the then-current 2.6

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Don't know which SSI project you are referring to... I only know the OpenSSI project, and I was one of the first who subscribed to its mailing list (since 2001). http://openssi.org/cgi-bin/view?page=openssi.html I don't think those OpenSSI clusters are designed for tens of thousands of nodes, and

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-26 Thread Josh Hursey
There are some great comments in this thread. Process migration (like many topics in systems) can get complex fast. The Open MPI process migration implementation is checkpoint/restart based (currently using BLCR), and uses an 'eager' style of migration. This style of migration stops a process comp

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-26 Thread Ralph Castain
FWIW: I'm in the process of porting some code from a branch that allows apps to do on-demand checkpoint/recovery style operations at the app level. Specifically, it provides the ability to: * request a "recovery image" - an application-level blob containing state info required for the app to re

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Joshua Hursey
There is a 'self' checkpointer (CRS component) that does application level checkpointing - exposed at the MPI level. I don't know how different what you are working on is, but maybe something like that could be harnessed. Note that I have not tested the 'self' checkpointer with the process migra

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Ralph Castain
Let's chat off-list about it - I don't see exactly how this works, but it may be similar enough. On Aug 27, 2011, at 8:30 AM, Joshua Hursey wrote: > There is a 'self' checkpointer (CRS component) that does application level > checkpointing - exposed at the MPI level. I don't know how differen