Hi Everyone,
I am trying to checkpoint an mpi application running on
multiple nodes. However, I get some error messages when i trigger the
checkpointing process.
Error: expected_component: PID information unavailable!
Error: expected_component: Component Name information u
No, sorry -- there are no "buffered" variants of the MPI_FILE_* functions like
there are with point-to-point communications. So when you do MPI_FILE_WRITE
(for example), it'll be directly using the buffer that you pass to it (which is
almost always what you want, anyway -- "buffered" modes of c
On Sat, 19 Dec 2009, Jeff Squyres wrote:
No, sorry -- there are no "buffered" variants of the MPI_FILE_* functions like there are
with point-to-point communications. So when you do MPI_FILE_WRITE (for example), it'll be directly
using the buffer that you pass to it (which is almost always wha
All,
Succeeded in overcoming the 'libtool' failure with PGI using
the patched snap (thanks Jeff), but now I am running
into a down stream problem compiling for our IB clusters.
I am using the latest PGI compiler (10.0-1) and the 12-14-09
snap of OpenMPI of version 1.4.0.
My configure line looks
Hi Ralph and all,
Yes, the OMPI libs and binaries are at the same place on the nodes, I
packed OMPI via checkinstall and installed the deb via pdsh on the nodes.
The LD_LIBRARY_PATH is set; I can run for example "mpirun --hostfile
nodefile hellocluster" without problems. But when started via torqu
Ah, and do I have to take care of the MCA ras plugin by my own?
I tried somethings like
> mpirun --mca ras tm --mca btl ras,plm --mca ras_tm_nodefile_dir
> /var/spool/torque/aux/ hellocluster
but despite that it has not helped/worked out ([node3:22726] mca: base:
components_open: component pml / c
That error has nothing to do with Torque. The cmd line is simply wrong - you
are specifying a btl that doesn't exist.
It should work just fine with
mpirun -n X hellocluster
Nothing else is required. When you run
mpirun --hostfile nodefile hellocluster
OMPI will still use Torque to do the laun
Sorry - hit "send" and then saw the version sitting right there in the subject!
Doh...
First, let's try verifying what components are actually getting used. Run this:
mpirun -n 1 -mca ras_base_verbose 10 -mca plm_base_verbose 10 which orted
Then get an allocation and run
mpirun -pernode which