Hi,

So, I was able to remove the "cannot open shared file or object" errors.
But I am not able to checkpoint yet. When I enter ompi-checkpoint PID of
mpirun, it does not return anything  (not even a new prompt). In my
mca-params.conf file, I added

sstore=stage

sstore_stage_local_snapshot_dir=/tmp/ndesai/local
sstore_base_global_snapshot_dir=/tmp/ndesai/global


I created the local and global folders myself.

I am running all the processes on a single machine.
What am I doing wrong? Please guide me.

Thanks,
Neel.


On Mon, Jun 3, 2013 at 9:34 AM, Neel Sunil Desai <neel.de...@colorado.edu>wrote:

> Hi Ralph.
>
> I checked the errors.
> I do not understand what the fololowing means : The session directory
> location could not be parsed.
>
>        ompi-checkpoint attempted to use the session directory:
>          /tmp/openmpi-sessions-ndesai@vcainternmpi01_0
> I opened the /tmp/openmpi-sessions-ndesai directory and various
> directories are created.
>
> Also, when I run the mpi program, I get the following errors before the
> program starts running correctly:
>
> [ndesai@vcainternmpi01 work]$ mpirun -am ft-enable-cr --np 16
> ./DecoderTest ../../decoder/test.ini
> [vcainternmpi01:25341] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25342] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25343] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25344] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25347] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25354] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25356] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25337] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25338] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25339] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25340] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25355] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25359] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25357] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25358] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
> [vcainternmpi01:25362] mca: base: component_find: unable to open
> /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
> object file: No such file or directory (ignored)
>
> I also checked the mca-params-conf file and all it contained were
> comments. Do I have to make any changes there for getting correct snapshots?
>
> Thanks a lot,
> Neel.
>
> On Fri, May 31, 2013 at 5:24 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Did you check the items on the list given in the error? I'm no expert on
>> ompi-checkpoint, but the error means that one of those conditions isn't
>> being met.
>>
>>
>>  On May 31, 2013, at 4:54 PM, Neel Sunil Desai <neel.de...@colorado.edu>
>> wrote:
>>
>>  Hi Ralph,
>>
>> Thanks for the help. The path and ld_path were not set to the correct
>> location. I was able to execute the ompi-checkpoint command. But, I got the
>> following error.
>>
>> [ndesai@vcainternmpi01 ~]$ ompi-checkpoint 1803
>> --------------------------------------------------------------------------
>> Error: Unable to find the requested, active MPIRUN process on this
>> machine.
>>        This could be due to one of the following:
>>         - The jobid specified by the '--hnp-jobid' option is not
>>           correct.
>>         - The PID specified (1803) is not that of an active MPIRUN.
>>         - The application with this PID is not checkpointable
>>         - The application with this PID is not an Open MPI application.
>>         - The session directory location could not be parsed.
>>        ompi-checkpoint attempted to use the session directory:
>>          /tmp/openmpi-sessions-ndesai@vcainternmpi01_0
>> Thanks,
>> Neel.
>>
>> On Fri, May 31, 2013 at 4:34 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Check that your path and ld_library_path are set to point to the
>>> directory where you installed the version you built (the --prefix=<> you
>>> provided).
>>>
>>>  On May 31, 2013, at 4:31 PM, Neel Sunil Desai <neel.de...@colorado.edu>
>>> wrote:
>>>
>>>  Hi Ralph,
>>>
>>> I did install open mpi with the --with-ft=cr option.
>>>
>>> Thanks,
>>> Neel.
>>>
>>> On Fri, May 31, 2013 at 4:25 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Okay, it should work it that version. It sounds like you didn't
>>>> configure OMPI with the --with-ft=cr option - yes? Take a look at
>>>> "./configure -h" for the ft-related options and ensure you build what you
>>>> need. C/R support is not built by default.
>>>>
>>>>
>>>>  On May 31, 2013, at 3:59 PM, Neel Sunil Desai <neel.de...@colorado.edu>
>>>> wrote:
>>>>
>>>> Open MPI 1.5.4
>>>>
>>>> On Fri, May 31, 2013 at 3:31 PM, Ralph Castain <r...@open-mpi.org>wrote:
>>>>
>>>>> What OMPI version?
>>>>>
>>>>> On May 31, 2013, at 3:17 PM, Neel Sunil Desai <neel.de...@colorado.edu>
>>>>> wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I forgot to add. I watched the video of Joshua Hursey and when I
>>>>> type ompi_info | grep FT, I get FT Checkpoint Support: no ( checkpoint
>>>>> thread : no). I do not get anything when I type ompi_info | grep crs.
>>>>> >
>>>>> > Thanks,
>>>>> > Neel.
>>>>> > _______________________________________________
>>>>> > users mailing list
>>>>> > us...@open-mpi.org
>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to