And what will you do for RTE components that aren't ORTE? This really isn't a
feature of a run-time, so it doesn't seem like it should be part of the RTE
interface...
Brian
On Feb 17, 2014, at 3:03 PM, Jeff Squyres (jsquyres) wrote:
> WHAT: New OMPI_RTE_EVENT_BASE define
>
> WHY: The usnic
I think I do not understand your question. So far I have only implemented the
checkpoint part and not the restart part.
Using criu_dump() the process can be left in three different
states. Without any special handling the process is dumped and then
killed. I can also tell criu to leave the proces
I concur with Brian, you should not expect the runtime to provide a default
event base, especially if you want some level of quality-of-service out of it.
Moreover, with the soon-to-happen move of the BTLs down in OPAL this approach
will definitively not be suitable.
George.
On Feb 18, 2014
I should have read this email before answering the other.
So opal_crs.checkpoint() is used to checkpoint the process as well as
restart the process? I would have expected opal_crs.restart() is used
for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to
only checkpoint and restart()
opal_crs.checkpoint() is not used to restart the process, but it does return in
two different cases:
- in the "continue" case, opal_crs.checkpoint() returns in the original process
and keeps executing the same process and then, IIRC, invokes
opal_crs.continue().
- in the "restart" case, opal_c
Ok, fair enough. My goal was not to spin up another progress thread in my BTL,
but I can certainly do so (to meet the 1.7.5 timeframe).
For the longer term (i.e., 1.9), should we add a little opal infrastructure
that contains an event base that is run in its own progress thread? This would
al
On Feb 18, 2014, at 13:16 , Jeff Squyres (jsquyres) wrote:
> Ok, fair enough. My goal was not to spin up another progress thread in my
> BTL, but I can certainly do so (to meet the 1.7.5 timeframe).
>
> For the longer term (i.e., 1.9), should we add a little opal infrastructure
> that contai
On Feb 18, 2014, at 8:18 AM, George Bosilca wrote:
>> For the longer term (i.e., 1.9), should we add a little opal infrastructure
>> that contains an event base that is run in its own progress thread? This
>> would allow the MPI layer to consolidate into one progress thread (for
>> things tha
On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
> > I tried to implement something like you described. It is not yet event
> > driven, but before continuing I wanted to get some feedback if it is at
> > least the right start:
> >
On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote:
> On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
>> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
>>> I tried to implement something like you described. It is not yet event
>>> driven, but before continuing I wanted to get som
On Tue, Feb 18, 2014 at 06:39:12AM -0800, Ralph Castain wrote:
> On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote:
>
> > On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
> >> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
> >>> I tried to implement something like you described. I
_WORD_MASK_ violates C99 § 7.1.3:
"All identifiers that begin with an underscore and either an uppercase letter or
another
underscore are always reserved for any use."
So we should probably rename the identifier.
-Nathan
On Mon, Feb 17, 2014 at 04:37:34PM +, Jeff Squyres (jsquyres) wrote:
So when a process is restarted with CRIU, does it resume execution after
the criu_dump() or somewhere else?
In a continue/leave-running mode after checkpoint the MPI library does not
need to do quite a much work since we can depend on some things not
changing (such as the machine name, orted pid,
Just replied to your other email before seeing this. Take a look at those
comments and let me know if that helps differentiate those interfaces.
On Tue, Feb 18, 2014 at 5:28 AM, Jeff Squyres (jsquyres) wrote:
> opal_crs.checkpoint() is not used to restart the process, but it does
> return in tw
On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote:
> So when a process is restarted with CRIU, does it resume execution after
> the criu_dump() or somewhere else?
The process is resumed at the same point it was checkpointed with
criu_dump().
> In a continue/leave-running mode after chec
Just a reminder -- this RFC timed out today.
If there are no objections to this, I'll commit the patch on #4205 to the trunk
tomorrow evening.
No one has come up with a patch yet for the v1.7 branch (because of ABI
reasons, it must be different than what we do on the trunk), but since that is
Yep. For the checkpoint/continue that patch looks good.
On Tue, Feb 18, 2014 at 11:30 AM, Adrian Reber wrote:
> On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote:
> > So when a process is restarted with CRIU, does it resume execution after
> > the criu_dump() or somewhere else?
>
> Th
On Tue, Feb 11, 2014 at 01:43:37AM +0100, George Bosilca wrote:
>
> The class is only usable in the context of a single .c file. As a code
> protection it makes perfect sense to me.
Ah, yes. So it is. Fixed in the latest patch.
> It’s not yet, and I did not notice an RFC about. The event I was
18 matches
Mail list logo