Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread Brian Barrett
And what will you do for RTE components that aren't ORTE? This really isn't a feature of a run-time, so it doesn't seem like it should be part of the RTE interface... Brian On Feb 17, 2014, at 3:03 PM, Jeff Squyres (jsquyres) wrote: > WHAT: New OMPI_RTE_EVENT_BASE define > > WHY: The usnic

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
I think I do not understand your question. So far I have only implemented the checkpoint part and not the restart part. Using criu_dump() the process can be left in three different states. Without any special handling the process is dumped and then killed. I can also tell criu to leave the proces

Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread George Bosilca
I concur with Brian, you should not expect the runtime to provide a default event base, especially if you want some level of quality-of-service out of it. Moreover, with the soon-to-happen move of the BTLs down in OPAL this approach will definitively not be suitable. George. On Feb 18, 2014

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-18 Thread Adrian Reber
I should have read this email before answering the other. So opal_crs.checkpoint() is used to checkpoint the process as well as restart the process? I would have expected opal_crs.restart() is used for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to only checkpoint and restart()

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-18 Thread Jeff Squyres (jsquyres)
opal_crs.checkpoint() is not used to restart the process, but it does return in two different cases: - in the "continue" case, opal_crs.checkpoint() returns in the original process and keeps executing the same process and then, IIRC, invokes opal_crs.continue(). - in the "restart" case, opal_c

Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread Jeff Squyres (jsquyres)
Ok, fair enough. My goal was not to spin up another progress thread in my BTL, but I can certainly do so (to meet the 1.7.5 timeframe). For the longer term (i.e., 1.9), should we add a little opal infrastructure that contains an event base that is run in its own progress thread? This would al

Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread George Bosilca
On Feb 18, 2014, at 13:16 , Jeff Squyres (jsquyres) wrote: > Ok, fair enough. My goal was not to spin up another progress thread in my > BTL, but I can certainly do so (to meet the 1.7.5 timeframe). > > For the longer term (i.e., 1.9), should we add a little opal infrastructure > that contai

Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread Jeff Squyres (jsquyres)
On Feb 18, 2014, at 8:18 AM, George Bosilca wrote: >> For the longer term (i.e., 1.9), should we add a little opal infrastructure >> that contains an event base that is run in its own progress thread? This >> would allow the MPI layer to consolidate into one progress thread (for >> things tha

Re: [OMPI devel] C/R and orte_oob

2014-02-18 Thread Adrian Reber
On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote: > On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote: > > I tried to implement something like you described. It is not yet event > > driven, but before continuing I wanted to get some feedback if it is at > > least the right start: > >

Re: [OMPI devel] C/R and orte_oob

2014-02-18 Thread Ralph Castain
On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote: > On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote: >> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote: >>> I tried to implement something like you described. It is not yet event >>> driven, but before continuing I wanted to get som

Re: [OMPI devel] C/R and orte_oob

2014-02-18 Thread Adrian Reber
On Tue, Feb 18, 2014 at 06:39:12AM -0800, Ralph Castain wrote: > On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote: > > > On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote: > >> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote: > >>> I tried to implement something like you described. I

Re: [OMPI devel] [PATCH] Fix typo defining macro _WORD_MASK_

2014-02-18 Thread Nathan Hjelm
_WORD_MASK_ violates C99 § 7.1.3: "All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use." So we should probably rename the identifier. -Nathan On Mon, Feb 17, 2014 at 04:37:34PM +, Jeff Squyres (jsquyres) wrote:

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
So when a process is restarted with CRIU, does it resume execution after the criu_dump() or somewhere else? In a continue/leave-running mode after checkpoint the MPI library does not need to do quite a much work since we can depend on some things not changing (such as the machine name, orted pid,

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-18 Thread Josh Hursey
Just replied to your other email before seeing this. Take a look at those comments and let me know if that helps differentiate those interfaces. On Tue, Feb 18, 2014 at 5:28 AM, Jeff Squyres (jsquyres) wrote: > opal_crs.checkpoint() is not used to restart the process, but it does > return in tw

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote: > So when a process is restarted with CRIU, does it resume execution after > the criu_dump() or somewhere else? The process is resumed at the same point it was checkpointed with criu_dump(). > In a continue/leave-running mode after chec

Re: [OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset

2014-02-18 Thread Jeff Squyres (jsquyres)
Just a reminder -- this RFC timed out today. If there are no objections to this, I'll commit the patch on #4205 to the trunk tomorrow evening. No one has come up with a patch yet for the v1.7 branch (because of ABI reasons, it must be different than what we do on the trunk), but since that is

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
Yep. For the checkpoint/continue that patch looks good. On Tue, Feb 18, 2014 at 11:30 AM, Adrian Reber wrote: > On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote: > > So when a process is restarted with CRIU, does it resume execution after > > the criu_dump() or somewhere else? > > Th

Re: [OMPI devel] RFC: optimize probe in ob1

2014-02-18 Thread Nathan Hjelm
On Tue, Feb 11, 2014 at 01:43:37AM +0100, George Bosilca wrote: > > The class is only usable in the context of a single .c file. As a code > protection it makes perfect sense to me. Ah, yes. So it is. Fixed in the latest patch. > It’s not yet, and I did not notice an RFC about. The event I was