The openib BTL and BLCR support in Open MPI were working about a year ago
(when I last checked). The psm BTL is not supported at the moment though.

>From the error, I suspect that we are not fully closing the openib btl
driver before the checkpoint thus when we try to restart it is looking for
a resource that is no longer present. I created a ticket for us to
investigate further if you want to follow it:
  https://svn.open-mpi.org/trac/ompi/ticket/3417

Unfortunately, I do not know who is currently supporting that code path (I
might pick it back up at some point, but cannot promise anything in the
near future). But I will keep an eye on the ticket and see what I can do.
If it is what I think it is, then it should not take too much work to get
it working again.

-- Josh

On Wed, Nov 28, 2012 at 5:14 AM, William Hay <w....@ucl.ac.uk> wrote:

> I'm trying to build openmpi with support for BLCR plus qlogic infiniband
> (plus grid engine).  Everything seems to compile OK and checkpoints are
> taken but whenever I try to restore a checkpoint I get the following error:
> - do_mmap(<file>, 00002aaab18c7000, 0000000000001000, ...) failed:
> ffffffffffffffea
> - mmap failed: /dev/ipath
> - thaw_threads returned error, aborting. -22
> - thaw_threads returned error, aborting. -22
> Restart failed: Invalid argument
>
> This occurs whether I specify psm or openib as the btl.
>
> This looks like the sort of thing I would expect to be handled by the blcr
> supporting code in openmpi.  So I guess I have a couple ofquestions.
> 1)Are Infiniband and BLCR support in openmpi compatible?
> 2)Are there any special tricks necessary to get them working together.
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey

Reply via email to