Hello,

Through my search of the archives I have come across similar problems such
as mine. However, after having tried several suggestions and still remaining
unsuccessful, I have decided to post my dilemma.

Basically, I have successfully compiled OpenMPI with the following
configuration parameters

shell$ ../configure --prefix=/home/vcoralic --enable-static --disable-shared
F77=nagfor FC=nagfor FCFLAGS=-mismatch

I have also successfully executed a parallel job by directly invoking mpirun.
However, when I try using the queuing system on the cluster, qsub, for a job
on say 8 processors, I get the following set of error messages (8 total)

./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory
./hello_f90: error while loading shared libraries: libf52.so.1: cannot open
shared object file: No such file or directory

where libf52.so.1 is shared object file belong ton the NAG Fortran compiler.

Now, I think I know what the problem is. Basically, the NAG Fortran compiler
and its libraries are only available on the master node so that the
remaining nodes cannot access/find the required files. From my
understanding, the only way to fix this would be put to copy the NAG Fortran
compiler to all of the nodes in the cluster. Is that correct?

Or, I suppose that a possible workaround would be to create a symbolic link
on all of the nodes through which each node could access the NAG Fortran
components available on the master node. This would additionally require
that the LD_LIBRARY_PATH be modified on all of the nodes so that the address
of the symbolic link be included globally (for all nodes).

The bottom line is that I am looking for some guidance on how to properly
fix the above error. I have a feeling that conceptually both of the above
ideas may work. However, at this time I am a bit hesitant to begin
implementing them since I am not absolutely sure that I know what I am doing
or that these "fixes" will work. It also doesn't help that others use this
cluster and I'd hate to be the one that breaks the whole thing down so that
no one can use it.

I appreciate any feedback that you can give me!

-- 
Vedran Coralic

Reply via email to