Hi, So, I was able to remove the "cannot open shared file or object" errors. But I am not able to checkpoint yet. When I enter ompi-checkpoint PID of mpirun, it does not return anything (not even a new prompt). In my mca-params.conf file, I added
sstore=stage sstore_stage_local_snapshot_dir=/tmp/ndesai/local sstore_base_global_snapshot_dir=/tmp/ndesai/global I created the local and global folders myself. I am running all the processes on a single machine. What am I doing wrong? Please guide me. Thanks, Neel. On Mon, Jun 3, 2013 at 9:34 AM, Neel Sunil Desai <neel.de...@colorado.edu>wrote: > Hi Ralph. > > I checked the errors. > I do not understand what the fololowing means : The session directory > location could not be parsed. > > ompi-checkpoint attempted to use the session directory: > /tmp/openmpi-sessions-ndesai@vcainternmpi01_0 > I opened the /tmp/openmpi-sessions-ndesai directory and various > directories are created. > > Also, when I run the mpi program, I get the following errors before the > program starts running correctly: > > [ndesai@vcainternmpi01 work]$ mpirun -am ft-enable-cr --np 16 > ./DecoderTest ../../decoder/test.ini > [vcainternmpi01:25341] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25342] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25343] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25344] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25347] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25354] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25356] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25337] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25338] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25339] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25340] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25355] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25359] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25357] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25358] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > [vcainternmpi01:25362] mca: base: component_find: unable to open > /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared > object file: No such file or directory (ignored) > > I also checked the mca-params-conf file and all it contained were > comments. Do I have to make any changes there for getting correct snapshots? > > Thanks a lot, > Neel. > > On Fri, May 31, 2013 at 5:24 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Did you check the items on the list given in the error? I'm no expert on >> ompi-checkpoint, but the error means that one of those conditions isn't >> being met. >> >> >> On May 31, 2013, at 4:54 PM, Neel Sunil Desai <neel.de...@colorado.edu> >> wrote: >> >> Hi Ralph, >> >> Thanks for the help. The path and ld_path were not set to the correct >> location. I was able to execute the ompi-checkpoint command. But, I got the >> following error. >> >> [ndesai@vcainternmpi01 ~]$ ompi-checkpoint 1803 >> -------------------------------------------------------------------------- >> Error: Unable to find the requested, active MPIRUN process on this >> machine. >> This could be due to one of the following: >> - The jobid specified by the '--hnp-jobid' option is not >> correct. >> - The PID specified (1803) is not that of an active MPIRUN. >> - The application with this PID is not checkpointable >> - The application with this PID is not an Open MPI application. >> - The session directory location could not be parsed. >> ompi-checkpoint attempted to use the session directory: >> /tmp/openmpi-sessions-ndesai@vcainternmpi01_0 >> Thanks, >> Neel. >> >> On Fri, May 31, 2013 at 4:34 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Check that your path and ld_library_path are set to point to the >>> directory where you installed the version you built (the --prefix=<> you >>> provided). >>> >>> On May 31, 2013, at 4:31 PM, Neel Sunil Desai <neel.de...@colorado.edu> >>> wrote: >>> >>> Hi Ralph, >>> >>> I did install open mpi with the --with-ft=cr option. >>> >>> Thanks, >>> Neel. >>> >>> On Fri, May 31, 2013 at 4:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Okay, it should work it that version. It sounds like you didn't >>>> configure OMPI with the --with-ft=cr option - yes? Take a look at >>>> "./configure -h" for the ft-related options and ensure you build what you >>>> need. C/R support is not built by default. >>>> >>>> >>>> On May 31, 2013, at 3:59 PM, Neel Sunil Desai <neel.de...@colorado.edu> >>>> wrote: >>>> >>>> Open MPI 1.5.4 >>>> >>>> On Fri, May 31, 2013 at 3:31 PM, Ralph Castain <r...@open-mpi.org>wrote: >>>> >>>>> What OMPI version? >>>>> >>>>> On May 31, 2013, at 3:17 PM, Neel Sunil Desai <neel.de...@colorado.edu> >>>>> wrote: >>>>> >>>>> > Hi, >>>>> > >>>>> > I forgot to add. I watched the video of Joshua Hursey and when I >>>>> type ompi_info | grep FT, I get FT Checkpoint Support: no ( checkpoint >>>>> thread : no). I do not get anything when I type ompi_info | grep crs. >>>>> > >>>>> > Thanks, >>>>> > Neel. >>>>> > _______________________________________________ >>>>> > users mailing list >>>>> > us...@open-mpi.org >>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> >>> >>> >> >> >