Thanks for the bug report. There are a couple of places in the code
that, in a sense, hard code '/tmp' as the temporary directory. It
shouldn't be to hard to fix since there is a common function used in
the code to discovery the 'true' temporary directory (which defaults
to /tmp). Of course
Josh,
I was following this thread as I had similar symptoms and discovered a
peculiar error. when I launch the program, openmpi follows the
$TMPDIR environment variable and puts the session information in the
$TMPDIR directory. However ompi-checkpoint seems to be requiring the
sessions file to b
I tested the 1.4.1 release, and everything worked fine for me (tested
a few different configurations of nodes/environments).
The ompi-checkpoint error you cited is usually caused by one of two
things:
- The PID specified is wrong (which I don't think that is the case
here)
- The session
So? anyone? any clue?
Summarize:
- installed OpenMPI 1.4.1 on fresh Centos 5
- mpirun works but ompi-checkpoint throws this error:
ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405
- on another VM I have OpenMPI 1.3.3. installed. Checkpointing works fine on
guest but has the previous
I noticed one more thing. As I still have some VMs that have OpenMPI version
1.3.3 installed I started to use those machines 'till I fix the problem with
1.4.1 And while checkpointing on one of this VMs I realized that
checkpointing as a guest works fine and checkpointing as a root outputs the
same
Well... I decided to install a fresh OS to be sure that there is no OpenMPI
version conflict. So I formatted one of my VMs, did a fresh CentOS install,
installed BLCR 0.8.2 and OpenMPI 1.4.1 and the result: the same. mpirun
works but ompi-checkpoint has that error at line 405:
[[35906,0],0] ORTE_E
It's almost midnight here, so I left home, but I will try it tomorrow.
There were some directories left after "make uninstall". I will give more
details tomorrow.
Thanks Jeff,
Andreea
On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres wrote:
> On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:
>
>
On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:
> - I wanted to update to version 1.4.1 and I uninstalled previous version like
> this: make uninstall, and than manually deleted all the left over files. the
> directory where I installed was /usr/local
I'll let Josh answer your CR questions,
I don't know what else should I try... because it worked on 1.3.3 doing
exactly the same steps. I tried to install it both with an active eth
interface and an inactive one. I am running on a virtual machine that has
CentOS as OS.
Any suggestions?
Thanks,
Andreea
On Fri, Jan 15, 2010 at 9:07 PM,
I tried the new version, that was uploaded today. I still have that error,
just that now is at line 405 instead of 399.
Maybe if I give more details:
- I first had OpenMPI version 1.3.3 with BLCR installed: mpirun,
ompi-checkpoint and ompi-restart worked with that version.
- I wanted to update to
Hi...
still not working. Though I uninstalled OpenMPI with make uninstall and I
manually deleted all other files, I still have the same error when
checkpointing.
Any idea?
Thanks,
Andreea
On Thu, Jan 14, 2010 at 10:38 PM, Joshua Hursey wrote:
> On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote
On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote:
> Hi,
>
> I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
> downloaded today. When I want to checkpoint I am having the following error
> message:
> [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
Hi,
I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
downloaded today. When I want to checkpoint I am having the following error
message:
[[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
399
HNP with PID 2337 Not found!
I tried the same thing with vers
13 matches
Mail list logo