The configuration looks fine, but from the stack it seems that the segv is coming from an invalid free in BLCR (which seems odd to me).

Are you able to get a gdb backtrace from a core file generated from this run? That would provide a bit more detail on where things are going wrong.

What version of BLCR are you running? Does BLCR work with sequential applications?

Additionally, have you tried Open MPI 1.3.3 or the trunk to see if the problem happen there as well?

-- Josh

On Sep 9, 2009, at 1:49 PM, Jean Potsam wrote:

Dear All,
I have installed openmpi 1.3.2 in my home directory ( /home/jean/openmpisof/ ) and BLCR in /usr/local/blcr. I have added the following in the .bashrc file

export PATH=/home/jean/openmpisof/bin/:$PATH
export LD_LIBRARY_PATH=/home/jean/openmpisof/lib/:$LD_LIBRARY_PATH

export PATH=/usr/local/blcr/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/blcr/lib:$LD_LIBRARY_PATH

I am running my application as follows:

mpirun -am ft-enable-cr -mca btl ^openib -mca snapc_base_global_snapshot_dir /tmp mpitest

But I get the following error when i try to checkpoint the application.

######################################
[sun06:20513] *** Process received signal ***
[sun06:20513] Signal: Segmentation fault (11)
[sun06:20513] Signal code: Address not mapped (1)
[sun06:20513] Failing at address: 0x4
[sun06:20513] [ 0] [0xb7fab40c]
[sun06:20513] [ 1] /lib/libc.so.6(cfree+0x3b) [0xb79e468b]
[sun06:20513] [ 2] /usr/local/blcr/lib/libcr.so.0(cri_info_free +0x2a) [0xb7b1725a]
[sun06:20513] [ 3] /usr/local/blcr/lib/libcr.so.0 [0xb7b18c72]
[sun06:20513] [ 4] /lib/libc.so.6(__libc_fork+0x186) [0xb7a0d266]
[sun06:20513] [ 5] /lib/libpthread.so.0(fork+0x14) [0xb7ac4b24]
[sun06:20513] [ 6] /home/jean/openmpisof/lib/libopen-pal.so.0 [0xb7bc2a01] [sun06:20513] [ 7] /home/jean/openmpisof/lib/libopen-pal.so. 0(opal_crs_blcr_checkpoint+0x187) [0xb7bc231b] [sun06:20513] [ 8] /home/jean/openmpisof/lib/libopen-pal.so. 0(opal_cr_inc_core+0xc3) [0xb7b8eb1d] [sun06:20513] [ 9] /home/jean/openmpisof/lib/libopen-rte.so.0 [0xb7cab40f] [sun06:20513] [10] /home/jean/openmpisof/lib/libopen-pal.so. 0(opal_cr_test_if_checkpoint_ready+0x129) [0xb7b8ea2a] [sun06:20513] [11] /home/jean/openmpisof/lib/libopen-pal.so.0 [0xb7b8f0f8]
[sun06:20513] [12] /lib/libpthread.so.0 [0xb7abbf3b]
[sun06:20513] [13] /lib/libc.so.6(clone+0x5e) [0xb7a42bee]
[sun06:20513] *** End of error message ***
#######################################

Any help will be very appreciated.

Regards,

Jean

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to