Just curious -- what's difficult about this? SIGTSTP and SIGCONT can be
caught; is there something preventing us from sending "stop" and
"continue" messages (just like we send "die" messages)?
(If I had to guess, I think the user is asking because some other MPI
implementations implement this ki
Jeff Squyres (jsquyres) wrote:
Just curious -- what's difficult about this?
SIGTSTP and SIGCONT can be caught; is there something preventing us
from sending "stop" and "continue" messages (just like we send "die"
messages)?
Nothing preventing it at all. The problem lies in what you
Ralph Castain wrote:
Jeff Squyres (jsquyres) wrote:
Just curious -- what's difficult about this? SIGTSTP and SIGCONT can
be caught; is there something preventing us from sending "stop" and
"continue" messages (just like we send "die" messages)?
Nothing preventing it at all. The problem li
I guess I had in my head that Josh already working on most of these
issues anyway for the checkpoint / restart work (i.e., all the quiescing
stuff). Indeed, if you think about it -- pause/resume is one form of a
checkpoint/restart. Hence, if the checkpoint/restart frameworks are
laid out right --
I forgot to mention that I completely agree that we don't need (or want)
to pause/resume the orteds. This is also in total agreement with the
checkpoint/restart philosophy: we are only checkpointing and restarting
the user application(s), not the run-time infrastructure. There may
still be quiesc
Jeff Squyres (jsquyres) wrote:
I guess I had in my head that Josh already
working on most of these issues anyway for the checkpoint / restart
work (i.e., all the quiescing stuff). Indeed, if you think about it --
pause/resume is one form of a checkpoint/restart. Hence, if the
ch
Hi,
I'm working on developing some components for OpenMPI,
but am a little unclear as to how to implement
efficient sends and receives. I'm wanting to do
zero-copy two-sided MPI, but as far as I can see, this
is not going to be easy. As best as I can tell, the
receive mechanism copies into a tempo