Re: [OMPI users] Checkpoint from inside MPI program with OpenMPI 1.4.2 ?

2011-10-27 Thread Nguyen Toan
Dear Josh, This will really help a lot. Thank you for the support. Best Regards, Nguyen Toan On Wed, Oct 26, 2011 at 9:20 PM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Since this would be a new feature for 1.4, we cannot move it since the > 1.4 branch is for bug fixes only. How

Re: [OMPI users] Checkpoint from inside MPI program with OpenMPI 1.4.2 ?

2011-10-26 Thread Nguyen Toan
Dear Josh, Thank you. I will test the 1.7 trunk as you suggested. Also I want to ask if we can add this interface to OpenMPI 1.4.2, because my applications are mainly involved in this version. Regards, Nguyen Toan On Wed, Oct 26, 2011 at 3:25 AM, Josh Hursey <jjhur...@open-mpi.org>

[OMPI users] Checkpoint from inside MPI program with OpenMPI 1.4.2 ?

2011-10-24 Thread Nguyen Toan
program with OpenMPI or how to do that. Any ideas are very appreciated. Regards, Nguyen Toan

Re: [OMPI users] openmpi self checkpointing - error while running example

2011-04-06 Thread Nguyen Toan
$ echo $LD_LIBRARY_PATH > > /cluster/sw/blcr/0.8.2/x86_64/gcc//lib:/cluster/sw/openmpi/1.5.3_ft/x86_64/gcc/lib:/opt/intel/Compiler/11.1/056/lib/intel64 > > The library path seems to be ok or should it look different? do you have > another idea? > cheers > roman > > ___

Re: [OMPI users] openmpi self checkpointing - error while running example

2011-04-06 Thread Nguyen Toan
Hi Roman, Did you try to checkpoint and restart with the parameter "-machinefile". It may work. Regards, Nguyen Toan On Wed, Apr 6, 2011 at 7:05 PM, Hellmüller Roman <hro...@student.ethz.ch>wrote: > Hi > > I'm trying to get fault tolerant ompi running on our cluste

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-03-03 Thread Nguyen Toan
Thanks Josh. Actually I also tested with the Himeno benchmark<http://accc.riken.jp/assets/files/himenob_loadmodule/himenoBMT_c_mpi.lzh>and got the same problem, so I think this could be a bug. Hope this information also helps. Regards, Nguyen Toan On Fri, Mar 4, 2011 at 12:04 AM, Joshua

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-25 Thread Nguyen Toan
Dear Josh, Did you find out the problem? I still cannot progress anything. Hope to hear some good news from you. Regards, Nguyen Toan On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toan <nguyentoan1...@gmail.com>wrote: > Hi Josh, > > I tried the MCA parameter you mentioned but

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-13 Thread Nguyen Toan
Hi Josh, I tried the MCA parameter you mentioned but it did not help, the unknown overhead still exists. Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1. Hope you can find out the problem. Thank you. Regards, Nguyen Toan On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey <jj

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-09 Thread Nguyen Toan
. Do you have any other idea? Regards, Nguyen Toan On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey <jjhur...@open-mpi.org>wrote: > There are a few reasons why this might be occurring. Did you build with the > '--enable-ft-thread' option? > > If so, it looks like

[OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-08 Thread Nguyen Toan
Hi all, I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2). I found that when running an application,which uses MPI_Isend, MPI_Irecv and MPI_Wait, enabling C/R, i.e using "-am ft-enable-cr", the application runtime is much longer than the normal execution with mpirun (no checkpoint

Re: [OMPI users] mpirun error in OpenMPI 1.5

2010-12-08 Thread Nguyen Toan
> the 1.5 series install. > > On Dec 8, 2010, at 8:02 AM, Nguyen Toan wrote: > > > Dear all, > > > > I am having a problem while running mpirun in OpenMPI 1.5 version. I > compiled OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows: > > > > ./configure \ >

[OMPI users] mpirun error in OpenMPI 1.5

2010-12-08 Thread Nguyen Toan
Dear all, I am having a problem while running mpirun in OpenMPI 1.5 version. I compiled OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows: ./configure \ --with-ft=cr \ --enable-mpi-threads \ --with-blcr=/home/nguyen/opt/blcr \ --with-blcr-libdir=/home/nguyen/opt/blcr/lib \

Re: [OMPI users] How to checkpoint atomic function in OpenMPI

2010-07-22 Thread Nguyen Toan
Dear Josh, I hope to see this new API soon. Anyway, I will try these critical section functions in BLCR. Thank you for the support. Best Regards, Nguyen Toan On Sat, Jul 17, 2010 at 6:34 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > > On Jun 14, 2010, at 5:26 AM, Nguyen Toan wro

Re: [OMPI users] Question on checkpoint overhead in Open MPI

2010-07-22 Thread Nguyen Toan
lication and system configuration specific but in general is there any relationship between "Others" and the number of processes or data size? Thank you. Best Regards, Nguyen Toan On Sat, Jul 17, 2010 at 6:25 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > The amount of checkpo

Re: [OMPI users] Question on checkpoint overhead in Open MPI

2010-07-15 Thread Nguyen Toan
Somebody helps please? I am sorry to spam the mailing list but I really need your help. Thanks in advance. Best Regards, Nguyen Toan On Thu, Jul 8, 2010 at 1:25 AM, Nguyen Toan <nguyentoan1...@gmail.com>wrote: > Hello everyone, > I have a question concerning the checkpoint overhead

[OMPI users] Question on checkpoint overhead in Open MPI

2010-07-07 Thread Nguyen Toan
to the overall checkpoint overhead in Open MPI. Is it because of the increase of coordination time for checkpoint? And what is included in the overall checkpoint overhead besides the BLCR's checkpoint overhead and coordination time? Thank you. Best Regards, Nguyen Toan

[OMPI users] How to checkpoint atomic function in OpenMPI

2010-06-14 Thread Nguyen Toan
int time (executing ompi-checkpoint), is there a way to let OpenMPI wait until my_atomic_func() finishes its operation? + How does ompi-checkpoint operate to checkpoint MPI threads? Regards, Nguyen Toan

Re: [OMPI users] ompi-restart failed

2010-06-14 Thread Nguyen Toan
Hi all, I finally figured out the answer. I just put the parameter "-machinefile host" in the "ompi-restart" command and it restarted correctly. So is it unable to restart multi-threaded application on 1 node in OpenMPI? Nguyen Toan On Tue, Jun 8, 2010 at 12:07 AM, Nguy

Re: [OMPI users] ompi-restart failed

2010-06-07 Thread Nguyen Toan
helps? Thank you very much. Nguyen Toan On Mon, Jun 7, 2010 at 11:51 PM, Nguyen Toan <nguyentoan1...@gmail.com>wrote: > Hello everyone, > > I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes > but it failed to restart (Segmentation fault). > Here are t

[OMPI users] ompi-restart failed

2010-06-07 Thread Nguyen Toan
as created successfully. However it failed to restart using ompi-restart: *"mpirun noticed that process rank 0 with PID 21242 on node rc014.local exited on signal 11 (Segmentation fault)" * Did I miss something in the installation of OpenMPI? Regards, Nguyen Toan

Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-05-24 Thread Nguyen Toan
t;$ ompi-restart ompi_global_snapshot_10982.ckpt >-- >mpirun noticed that process rank 1 with PID 11346 on node rc013.local exited >on signal 11 (Segmentation fault). >--