On 28 November 2012 11:14, William Hay <w....@ucl.ac.uk> wrote:
> I'm trying to build openmpi with support for BLCR plus qlogic infiniband > (plus grid engine). Everything seems to compile OK and checkpoints are > taken but whenever I try to restore a checkpoint I get the following error: > - do_mmap(<file>, 00002aaab18c7000, 0000000000001000, ...) failed: > ffffffffffffffea > - mmap failed: /dev/ipath > - thaw_threads returned error, aborting. -22 > - thaw_threads returned error, aborting. -22 > Restart failed: Invalid argument > > This occurs whether I specify psm or openib as the btl. > > This looks like the sort of thing I would expect to be handled by the blcr > supporting code in openmpi. So I guess I have a couple ofquestions. > 1)Are Infiniband and BLCR support in openmpi compatible? > 2)Are there any special tricks necessary to get them working together. > > A third question occurred to me that may be relevant. How do I verify that my openmpi install has blcr support built in? I would have thought this would mean that either mpiexec or binaries built with mpicc would have libcr linked in. However running ldd doesn't report this in either case. I'm setting LD_PRELOAD to point to it but I would have thought openmpi would need to register a callback with blcr and it would be easier to do this if the library were linked in rather than trying to detect whether it has been LD_PRELOADed. I'm building with the following options: ./configure --prefix=/home/ccaawih/openmpi-blcr --with-openib --without-psm --with-blcr=/usr --with-blcr-libdir=/usr/lib64 --with-ft=cr --enable-ft-thread --enable-mpi-threads --with-sge