If you're integrating a new checkpoint/restart system inside Open MPI, you probably want to re-send this mail to the devel list to get the attention of the right people who can help you.

On Sep 28, 2009, at 11:55 AM, Kritiraj Sajadah wrote:

Dear All,
I am trying to integrate DMTCP with openmpi. IF I run a c application, it works fine. But when I execute the program using mpirun, It checkpoints application but gives error when restarting the application.

#############
[31007] WARNING at connection.cpp:303 in restore; REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX ) && _sockType == SOCK_STREAM) failed'
     id() = 2ab3f248-30933-4ac0d75a(99007)
     _sockDomain = 10
     _sockType = 1
     _sockProtocol = 0
Message: socket type not yet [fully] supported
[31007] WARNING at connection.cpp:303 in restore; REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX ) && _sockType == SOCK_STREAM) failed'
     id() = 2ab3f248-30943-4ac0d75c(99007)
     _sockDomain = 10
     _sockType = 1
     _sockProtocol = 0
Message: socket type not yet [fully] supported
[31013] WARNING at connection.cpp:87 in restartDup2; REASON='JWARNING(_real_dup2 ( oldFd, fd ) == fd) failed'
     oldFd = 537
     fd = 1
     (strerror((*__errno_location ()))) = Bad file descriptor
[31013] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed'
     i->second = 537
     (strerror((*__errno_location ()))) = Bad file descriptor
[31015] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed'
     i->second = 537
     (strerror((*__errno_location ()))) = Bad file descriptor
[31017] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed'
     i->second = 537
     (strerror((*__errno_location ()))) = Bad file descriptor
[31007] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed'
     i->second = 537
     (strerror((*__errno_location ()))) = Bad file descriptor
MTCP: mtcp_restart_nolibc: mapping current version of /usr/lib/gconv/ gconv-modules.cache into memory;
  _not_ file as it existed at time of checkpoint.
Change mtcp_restart_nolibc.c:634 and re-compile, if you want different behavior. [31015] ERROR at connection.cpp:372 in restoreOptions; REASON='JASSERT(ret == 0) failed'
     (strerror((*__errno_location ()))) = Invalid argument
     fds[0] = 6
     opt->first = 26
     opt->second.size() = 4
Message: restoring setsockopt failed
Terminating...
#############################################################

Any suggestions is very welcomed.

regards,

Raj



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com

Reply via email to