Hi,
> Am 29.09.2016 um 14:41 schrieb <[email protected]> <[email protected]>:
>
> Hi,
>
> I am trying to run a Job on parallel nodes using openmpi1.4.5
Better would be Open MPI 1.6.5. Nevertheless the questions are:
- Was Open MPI compiled with SGE integration, i.e. something like:
$ ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
- Did you request a PE in the submission and how is this PE set up?
- How does the `mpiexec` line in your jobscript look like?
- All nodes can talk to each other directly?
> and ge2011.11, job goes in Running state and then gets aborted.
> After Job gets aborted, I get following error message on the primary node:
>
> error: executing task of job 28561 failed: failed sending task to
> [email protected]: can't find connection
>
> Or
>
> error: executing task of job 28560 failed: failed sending task to
> [email protected]: can't find connection
> --------------------------------------------------------------------------
> A daemon (pid 20651) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
- Regarding this error, maybe you have to set explicitly LD_LIBRARY_PATH with
the path to the dynamic libraries, and export this in your jobscript to the
nodes:
export LD_LIBRARY_PATH=<your_location_of_the_shared_libs>
mpiexec -x LD_LIBRARY_PATH
BTW: The Open MPI is also available on all nodes?
-- Reuti
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> Is this mpi issue ? Please suggest how do I resolve this connection issue
> between nodes.
>
> Thanks & Regards,
> Aditi
>
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments. WARNING: Computer viruses can be transmitted via
> email. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email. www.wipro.com
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users