If you're integrating a new checkpoint/restart system inside Open MPI,
you probably want to re-send this mail to the devel list to get the
attention of the right people who can help you.
On Sep 28, 2009, at 11:55 AM, Kritiraj Sajadah wrote:
Dear All,
I am trying to integrate DMTCP with openmpi. IF I run a c
application, it works fine. But when I execute the program using
mpirun, It checkpoints application but gives error when restarting
the application.
#############
[31007] WARNING at connection.cpp:303 in restore;
REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX )
&& _sockType == SOCK_STREAM) failed'
id() = 2ab3f248-30933-4ac0d75a(99007)
_sockDomain = 10
_sockType = 1
_sockProtocol = 0
Message: socket type not yet [fully] supported
[31007] WARNING at connection.cpp:303 in restore;
REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX )
&& _sockType == SOCK_STREAM) failed'
id() = 2ab3f248-30943-4ac0d75c(99007)
_sockDomain = 10
_sockType = 1
_sockProtocol = 0
Message: socket type not yet [fully] supported
[31013] WARNING at connection.cpp:87 in restartDup2;
REASON='JWARNING(_real_dup2 ( oldFd, fd ) == fd) failed'
oldFd = 537
fd = 1
(strerror((*__errno_location ()))) = Bad file descriptor
[31013] WARNING at connectionmanager.cpp:627 in closeAll;
REASON='JWARNING(_real_close ( i->second ) ==0) failed'
i->second = 537
(strerror((*__errno_location ()))) = Bad file descriptor
[31015] WARNING at connectionmanager.cpp:627 in closeAll;
REASON='JWARNING(_real_close ( i->second ) ==0) failed'
i->second = 537
(strerror((*__errno_location ()))) = Bad file descriptor
[31017] WARNING at connectionmanager.cpp:627 in closeAll;
REASON='JWARNING(_real_close ( i->second ) ==0) failed'
i->second = 537
(strerror((*__errno_location ()))) = Bad file descriptor
[31007] WARNING at connectionmanager.cpp:627 in closeAll;
REASON='JWARNING(_real_close ( i->second ) ==0) failed'
i->second = 537
(strerror((*__errno_location ()))) = Bad file descriptor
MTCP: mtcp_restart_nolibc: mapping current version of /usr/lib/gconv/
gconv-modules.cache into memory;
_not_ file as it existed at time of checkpoint.
Change mtcp_restart_nolibc.c:634 and re-compile, if you want
different behavior.
[31015] ERROR at connection.cpp:372 in restoreOptions;
REASON='JASSERT(ret == 0) failed'
(strerror((*__errno_location ()))) = Invalid argument
fds[0] = 6
opt->first = 26
opt->second.size() = 4
Message: restoring setsockopt failed
Terminating...
#############################################################
Any suggestions is very welcomed.
regards,
Raj
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
jsquy...@cisco.com