The openib BTL and BLCR support in Open MPI were working about a year ago (when I last checked). The psm BTL is not supported at the moment though.
>From the error, I suspect that we are not fully closing the openib btl driver before the checkpoint thus when we try to restart it is looking for a resource that is no longer present. I created a ticket for us to investigate further if you want to follow it: https://svn.open-mpi.org/trac/ompi/ticket/3417 Unfortunately, I do not know who is currently supporting that code path (I might pick it back up at some point, but cannot promise anything in the near future). But I will keep an eye on the ticket and see what I can do. If it is what I think it is, then it should not take too much work to get it working again. -- Josh On Wed, Nov 28, 2012 at 5:14 AM, William Hay <w....@ucl.ac.uk> wrote: > I'm trying to build openmpi with support for BLCR plus qlogic infiniband > (plus grid engine). Everything seems to compile OK and checkpoints are > taken but whenever I try to restore a checkpoint I get the following error: > - do_mmap(<file>, 00002aaab18c7000, 0000000000001000, ...) failed: > ffffffffffffffea > - mmap failed: /dev/ipath > - thaw_threads returned error, aborting. -22 > - thaw_threads returned error, aborting. -22 > Restart failed: Invalid argument > > This occurs whether I specify psm or openib as the btl. > > This looks like the sort of thing I would expect to be handled by the blcr > supporting code in openmpi. So I guess I have a couple ofquestions. > 1)Are Infiniband and BLCR support in openmpi compatible? > 2)Are there any special tricks necessary to get them working together. > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey