I tested the 1.4.1 release, and everything worked fine for me (tested
a few different configurations of nodes/environments).
The ompi-checkpoint error you cited is usually caused by one of two
things:
- The PID specified is wrong (which I don't think that is the case
here)
- The session directory cannot be found in /tmp.
So I think the problem is the latter. The session directory looks
something like:
/tmp/openmpi-sessions-USERNAME@LOCALHOST_0
Within this directory the mpirun process places its contact
information. ompi-checkpoint uses this contact information to connect
to the job. If it cannot find it, then it errors out. (We definitely
need a better error message here. I filed a ticket [1]).
We usually do not recommend running Open MPI as a root user. So I
would strongly recommend that you do not run as a root user.
With a regular user, check the location of the session directory. Make
sure that it is in /tmp on the node where 'mpirun' and 'ompi-
checkpoint' are run.
-- Josh
[1] https://svn.open-mpi.org/trac/ompi/ticket/2189
On Jan 25, 2010, at 5:48 AM, Andreea Costea wrote:
So? anyone? any clue?
Summarize:
- installed OpenMPI 1.4.1 on fresh Centos 5
- mpirun works but ompi-checkpoint throws this error:
ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405
- on another VM I have OpenMPI 1.3.3. installed. Checkpointing works
fine on guest but has the previous mentioned error on root. Both
root and guest show the same output after "param -all -all" except
for the $HOME (which only matters for mca_component_path,
mca_param_files, snapc_base_global_snapshot_dir)
Thanks,
Andreea
On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea <andre.cos...@gmail.com
> wrote:
I noticed one more thing. As I still have some VMs that have OpenMPI
version 1.3.3 installed I started to use those machines 'till I fix
the problem with 1.4.1 And while checkpointing on one of this VMs I
realized that checkpointing as a guest works fine and checkpointing
as a root outputs the same error like in 1.4.1. : ORTE_ERROR_LOG:
Not found in file orte-checkpoint.c at line 405
I logged the outputs of "ompi_info --param all all" which I run for
root and for another user and the only differences were at these
parameters:
mca_component_path
mca_param_files
snapc_base_global_snapshot_dir
All 3 params differ because of the $HOME.
One more thing: I don't have the directory $HOME/.openmpi
Ideas?
Thanks,
Andreea
On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea <andre.cos...@gmail.com
> wrote:
Well... I decided to install a fresh OS to be sure that there is no
OpenMPI version conflict. So I formatted one of my VMs, did a fresh
CentOS install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the
result: the same. mpirun works but ompi-checkpoint has that error at
line 405:
[[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at
line 405
As for the files remaining after uninstalling: Jeff you were rigth.
There is no file left, just some empty directories.
Which might be the problem with that ORTE_ERROR_LOG error?
Thanks,
Andreea
On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea <andre.cos...@gmail.com
> wrote:
It's almost midnight here, so I left home, but I will try it tomorrow.
There were some directories left after "make uninstall". I will give
more details tomorrow.
Thanks Jeff,
Andreea
On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres <jsquy...@cisco.com>
wrote:
On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:
> - I wanted to update to version 1.4.1 and I uninstalled previous
version like this: make uninstall, and than manually deleted all the
left over files. the directory where I installed was /usr/local
I'll let Josh answer your CR questions, but I did want to ask about
this point. AFAIK, "make uninstall" removes *all* Open MPI files.
For example:
-----
[7:25] $ cd /path/to/my/OMPI/tree
[7:25] $ make install > /dev/null
[7:26] $ find /tmp/bogus/ -type f | wc
646 646 28082
[7:26] $ make uninstall > /dev/null
[7:27] $ find /tmp/bogus/ -type f | wc
0 0 0
[7:27] $
-----
I realize that some *directories* are left in $prefix, but there
should be no *files* left. Are you seeing something different?
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users