On Tue, Feb 11, 2014 at 01:43:37AM +0100, George Bosilca wrote:
>
> The class is only usable in the context of a single .c file. As a code
> protection it makes perfect sense to me.
Ah, yes. So it is. Fixed in the latest patch.
> It’s not yet, and I did not notice an RFC about. The event I was
Yep. For the checkpoint/continue that patch looks good.
On Tue, Feb 18, 2014 at 11:30 AM, Adrian Reber wrote:
> On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote:
> > So when a process is restarted with CRIU, does it resume execution after
> > the criu_dump() or somewhere else?
>
> Th
Just a reminder -- this RFC timed out today.
If there are no objections to this, I'll commit the patch on #4205 to the trunk
tomorrow evening.
No one has come up with a patch yet for the v1.7 branch (because of ABI
reasons, it must be different than what we do on the trunk), but since that is
On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote:
> So when a process is restarted with CRIU, does it resume execution after
> the criu_dump() or somewhere else?
The process is resumed at the same point it was checkpointed with
criu_dump().
> In a continue/leave-running mode after chec
Just replied to your other email before seeing this. Take a look at those
comments and let me know if that helps differentiate those interfaces.
On Tue, Feb 18, 2014 at 5:28 AM, Jeff Squyres (jsquyres) wrote:
> opal_crs.checkpoint() is not used to restart the process, but it does
> return in tw
So when a process is restarted with CRIU, does it resume execution after
the criu_dump() or somewhere else?
In a continue/leave-running mode after checkpoint the MPI library does not
need to do quite a much work since we can depend on some things not
changing (such as the machine name, orted pid,
_WORD_MASK_ violates C99 § 7.1.3:
"All identifiers that begin with an underscore and either an uppercase letter or
another
underscore are always reserved for any use."
So we should probably rename the identifier.
-Nathan
On Mon, Feb 17, 2014 at 04:37:34PM +, Jeff Squyres (jsquyres) wrote:
On Tue, Feb 18, 2014 at 06:39:12AM -0800, Ralph Castain wrote:
> On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote:
>
> > On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
> >> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
> >>> I tried to implement something like you described. I
On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote:
> On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
>> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
>>> I tried to implement something like you described. It is not yet event
>>> driven, but before continuing I wanted to get som
On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote:
> > I tried to implement something like you described. It is not yet event
> > driven, but before continuing I wanted to get some feedback if it is at
> > least the right start:
> >
On Feb 18, 2014, at 8:18 AM, George Bosilca wrote:
>> For the longer term (i.e., 1.9), should we add a little opal infrastructure
>> that contains an event base that is run in its own progress thread? This
>> would allow the MPI layer to consolidate into one progress thread (for
>> things tha
On Feb 18, 2014, at 13:16 , Jeff Squyres (jsquyres) wrote:
> Ok, fair enough. My goal was not to spin up another progress thread in my
> BTL, but I can certainly do so (to meet the 1.7.5 timeframe).
>
> For the longer term (i.e., 1.9), should we add a little opal infrastructure
> that contai
Ok, fair enough. My goal was not to spin up another progress thread in my BTL,
but I can certainly do so (to meet the 1.7.5 timeframe).
For the longer term (i.e., 1.9), should we add a little opal infrastructure
that contains an event base that is run in its own progress thread? This would
al
opal_crs.checkpoint() is not used to restart the process, but it does return in
two different cases:
- in the "continue" case, opal_crs.checkpoint() returns in the original process
and keeps executing the same process and then, IIRC, invokes
opal_crs.continue().
- in the "restart" case, opal_c
I should have read this email before answering the other.
So opal_crs.checkpoint() is used to checkpoint the process as well as
restart the process? I would have expected opal_crs.restart() is used
for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to
only checkpoint and restart()
I concur with Brian, you should not expect the runtime to provide a default
event base, especially if you want some level of quality-of-service out of it.
Moreover, with the soon-to-happen move of the BTLs down in OPAL this approach
will definitively not be suitable.
George.
On Feb 18, 2014
I think I do not understand your question. So far I have only implemented the
checkpoint part and not the restart part.
Using criu_dump() the process can be left in three different
states. Without any special handling the process is dumped and then
killed. I can also tell criu to leave the proces
And what will you do for RTE components that aren't ORTE? This really isn't a
feature of a run-time, so it doesn't seem like it should be part of the RTE
interface...
Brian
On Feb 17, 2014, at 3:03 PM, Jeff Squyres (jsquyres) wrote:
> WHAT: New OMPI_RTE_EVENT_BASE define
>
> WHY: The usnic
18 matches
Mail list logo