Good morning,

I think I sent this out last week but I did some "experimentation"
and kind-of/sort-of got my OpenMPI application to run. But I do
have a weird problem.

I can get the application (build with OpenMPI-1.3.2 with gcc and
the app is built with Intel 10.2) to run on the IB network (not sure
of the version of OFED but it might be 1.3.x) with certain CPUs.
For example I can run the application on AMD Shanghai processors
just fine. But when I try some other processors (also AMD), I
get the following error message:


error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 27796) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished



I've been googling my fingers off without any luck. My next steps are
to start putting printf's in OpenMPI to figure out where the problem
is occurring :)  Any ideas or things I can do to start? (I can provide all
kinds of information including ompi_info if you anyone cares to look
through it).

TIA!

Jeff

Reply via email to