Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Jeff Squyres (jsquyres)
Just curious -- what's difficult about this? SIGTSTP and SIGCONT can be caught; is there something preventing us from sending "stop" and "continue" messages (just like we send "die" messages)? (If I had to guess, I think the user is asking because some other MPI implementations implement this ki

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Ralph Castain
Jeff Squyres (jsquyres) wrote: Just curious -- what's difficult about this?  SIGTSTP and SIGCONT can be caught; is there something preventing us from sending "stop" and "continue" messages (just like we send "die" messages)? Nothing preventing it at all. The problem lies in what you

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Pak Lui
Ralph Castain wrote: Jeff Squyres (jsquyres) wrote: Just curious -- what's difficult about this? SIGTSTP and SIGCONT can be caught; is there something preventing us from sending "stop" and "continue" messages (just like we send "die" messages)? Nothing preventing it at all. The problem li

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Jeff Squyres (jsquyres)
I guess I had in my head that Josh already working on most of these issues anyway for the checkpoint / restart work (i.e., all the quiescing stuff). Indeed, if you think about it -- pause/resume is one form of a checkpoint/restart. Hence, if the checkpoint/restart frameworks are laid out right --

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Jeff Squyres (jsquyres)
I forgot to mention that I completely agree that we don't need (or want) to pause/resume the orteds. This is also in total agreement with the checkpoint/restart philosophy: we are only checkpointing and restarting the user application(s), not the run-time infrastructure. There may still be quiesc

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Ralph Castain
Jeff Squyres (jsquyres) wrote: I guess I had in my head that Josh already working on most of these issues anyway for the checkpoint / restart work (i.e., all the quiescing stuff).  Indeed, if you think about it -- pause/resume is one form of a checkpoint/restart.  Hence, if the ch

[OMPI devel] Query on zero-copy sends

2006-06-02 Thread Jonathan Day
Hi, I'm working on developing some components for OpenMPI, but am a little unclear as to how to implement efficient sends and receives. I'm wanting to do zero-copy two-sided MPI, but as far as I can see, this is not going to be easy. As best as I can tell, the receive mechanism copies into a tempo