I re-merged down to the libevent-merge branch (to include r17872) and
a new tarball has been uploaded to http://www.open-mpi.org/~jsquyres/unofficial/
On Mar 18, 2008, at 10:11 PM, George Bosilca wrote:
Commit 17872 is the one you're looking for.
https://svn.open-mpi.org/trac/ompi/changese
Commit 17872 is the one you're looking for.
https://svn.open-mpi.org/trac/ompi/changeset/17872
george.
On Mar 18, 2008, at 9:12 PM, Jeff Squyres wrote:
When did you fix it? I merged the trunk down to the libevent-merge
branch late this afternoon (r17869).
On Mar 18, 2008, at 7:29 PM, Georg
When did you fix it? I merged the trunk down to the libevent-merge
branch late this afternoon (r17869).
On Mar 18, 2008, at 7:29 PM, George Bosilca wrote:
This has been fixed in the trunk, but not yet merged in the branch.
george.
On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote:
I found
After taking a look at how epoll is implemented in the Linyux kernel, I
can say with 100% certainty that BLCR will not restore the epoll fd
correctly. I hope to fix that eventually, but have too many other
things on my plate to address is now.
Since I cannot promise how soon BLCR may be able to r
This has been fixed in the trunk, but not yet merged in the branch.
george.
On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote:
I found another problem with the libevent branch.
If I set "-mca btl tcp,self" on the command line then I get a segfult
when sending messages > 16 KB. I can try to mak
I found another problem with the libevent branch.
If I set "-mca btl tcp,self" on the command line then I get a segfult
when sending messages > 16 KB. I can try to make a smaller repeater,
but if you use the "progress" or "simple" tests in ompi-tests below:
https://svn.open-mpi.org/svn/omp
I have some more data from the field.
Leaving "opal_event_include" unset (Default) BLCR would give me the
following error when trying to restart a 2 process 'noop' MPI
application:
shell$ ompi-restart ompi_global_snapshot_8587.ckpt
Restart failed: Bad file descri
Its like rewriting libevent from scratch. I guess it can be done, but
it will be a long and painful process. How about the following solution:
- the daemons are aware that the checkpointing is enabled. They can
set the environment variable which will force the opal_event_include
to be set t
George added an MCA parameter for it (opal_event_include is a string
that can be set to "select" or "poll"), but it has to be set before
opal_init().
Josh: could you try running with the MCA parameter opal_event_include
set to "select"? This would confirm Brian's hypothesis...
Given that
If avoiding epoll() makes Josh's problems go away, PLEASE let me know
because that might indicate a deficiency in BLCR that I would want to
address.
-Paul
Brian W. Barrett wrote:
> Jeff / George -
>
> Did you add a way to specify which event modules are used? Because epoll
> pushs the socket l
Jeff / George -
Did you add a way to specify which event modules are used? Because epoll
pushs the socket list into the kernel, I can see how it would screw up
BLCR. I bet everything would work if we forced the use of poll / select.
Brian
On Tue, 18 Mar 2008, Jeff Squyres wrote:
Crud, ok
Crud, ok. Keep us posted.
On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:
I'm testing with checkpoint/restart and the new libevent seems to be
messing up the checkpoints generated by BLCR. I'll be taking a look
at it over the next couple of days, but just thought I'd let people
know. Unfortuna
I'm testing with checkpoint/restart and the new libevent seems to be
messing up the checkpoints generated by BLCR. I'll be taking a look
at it over the next couple of days, but just thought I'd let people
know. Unfortunately I don't have any more details at the moment.
-- Josh
On Mar 17, 2
13 matches
Mail list logo