I can successfully run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine but 
not via qrsh. We're
using CentOS 6.3 and a heterogeneous cluster of hyperthreaded and 
non-hyperthreaded blades
and x3550 chassis. OpenMPI 1.8.7 has been built w/the debug switch as well.

Here's my latest errors:
qrsh -V -now yes -pe mpi 209 mpirun -np 209 -display-devel-map --prefix 
/hpc/apps/mpi/openmpi/1.8.7/ --mca btl ^sm --hetero-nodes --bind-to core 
/hpc/home/lanew/mpi/openmpi/ProcessColors3
error: executing task of job 211298 failed: execution daemon on host 
"csclprd3-0-4" didn't accept task
error: executing task of job 211298 failed: execution daemon on host 
"csclprd3-4-1" didn't accept task
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------

NOTE: the hosts that "didn't accept task" were different in two different runs 
but the errors were the same.

Here's the definition of the mpi Parallel Environment on our Son-of-Gridengine 
cluster:

pe_name            mpi
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/sge/mpi/startmpi.sh $pe_hostfile
stop_proc_args     /opt/sge/mpi/stopmpi.sh
allocation_rule    $fill_up
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary TRUE
qsort_args         NONE

Qsort_args is set to NONE, but it's supposed to be set to TRUE right?

-Bill L.

If I can run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine w/no issues it 
has to be Son-of-Gridengine that's
the issue right?

-Bill L.
________________________________________
From: users [users-boun...@open-mpi.org] on behalf of Dave Love 
[d.l...@liverpool.ac.uk]
Sent: Tuesday, August 11, 2015 9:34 AM
To: Open MPI Users
Subject: Re: [OMPI users] Son of Grid Engine,   Parallel Environments and 
OpenMPI 1.8.7

"Lane, William" <william.l...@cshs.org> writes:

> I read @
>
> https://www.open-mpi.org/faq/?category=sge
>
> that for OpenMPI Parallel Environments there's
> a special consideration for Son of Grid Engine:
>
>    '"qsort_args" is necessary with the Son of Grid Engine distribution,
>    version 8.1.1 and later, and probably only applicable to it.  For
>    very old versions of SGE, omit "accounting_summary" too.'
>
> Does this requirement still hold true for OpenMPI 1.8.7? Because
> the webpage above only refers to much older versions of OpenMPI.

That's actually unrelated to OMPI, and the current distribution contains
an "mpi" PE for tight integration which should work with OMPI and modern
MPICH-y startup (hydra?), at least.
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/08/27426.php
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.

Reply via email to