Hi Reuti, I do no reset any environment variable during job submission or job handling. Is there a simple way to check that openmpi is working as expected with SGE tight integration (as displaying environment variables, setting options on the command line, etc. ) ?
Regards, Eloi On Friday 21 May 2010 17:35:24 Reuti wrote: > Hi, > > Am 21.05.2010 um 17:19 schrieb Eloi Gaudry: > > Hi Reuti, > > > > Yes, the openmpi binaries used were build after having used the > > --with-sge during configure, and we only use those binaries on our > > cluster. > > > > [eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info > > > > MCA ras: gridengine (MCA v2.0, API v2.0, Component > > v1.3.3) > > ok. As you have a Tight Integration as goal and set in your PE > "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes > which are not in the list of granted nodes. So it looks, like your job is > running outside of this Tight Integration with its own `rsh`or `ssh`. > > Do you reset $JOB_ID or other environment variables in your jobscript, > which could trigger Open MPI to assume that it's not running inside SGE? > > -- Reuti > > > On Friday 21 May 2010 16:01:54 Reuti wrote: > >> Hi, > >> > >> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry: > >>> Hi there, > >>> > >>> I'm observing something strange on our cluster managed by SGE6.2u4 when > >>> launching a parallel computation on several nodes, using OpenMPI/SGE > >>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE > >>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is > >>> own > >>> round-robin allocation based on the allocated node hostnames. > >> > >> you compiled Open MPI with --with-sge (and recompiled your > >> applications)? You are using the correct mpiexec? > >> > >> -- Reuti > >> > >>> Here is what I'm doing: > >>> - launch a parallel computation involving 8 processors, using for each > >>> of them 14GB of memory. I'm using a qsub command where i request > >>> memory_free resource and use tight integration with openmpi > >>> - 3 servers are available: > >>> . barney with 4 cores (4 slots) and 32GB > >>> . carl with 4 cores (4 slots) and 32GB > >>> . charlie with 8 cores (8 slots) and 64GB > >>> > >>> Here is the output of the allocated nodes (OpenMPI output): > >>> ====================== ALLOCATED NODES ====================== > >>> > >>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 > >>> > >>> Daemon: [[44332,0],0] Daemon launched: True > >>> Num slots: 4 Slots in use: 0 > >>> Num slots allocated: 4 Max slots: 0 > >>> Username on node: NULL > >>> Num procs: 0 Next node_rank: 0 > >>> > >>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 > >>> > >>> Daemon: Not defined Daemon launched: False > >>> Num slots: 2 Slots in use: 0 > >>> Num slots allocated: 2 Max slots: 0 > >>> Username on node: NULL > >>> Num procs: 0 Next node_rank: 0 > >>> > >>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 > >>> > >>> Daemon: Not defined Daemon launched: False > >>> Num slots: 2 Slots in use: 0 > >>> Num slots allocated: 2 Max slots: 0 > >>> Username on node: NULL > >>> Num procs: 0 Next node_rank: 0 > >>> > >>> ================================================================= > >>> > >>> Here is what I see when my computation is running on the cluster: > >>> # rank pid hostname > >>> > >>> 0 28112 charlie > >>> 1 11417 carl > >>> 2 11808 barney > >>> 3 28113 charlie > >>> 4 11418 carl > >>> 5 11809 barney > >>> 6 28114 charlie > >>> 7 11419 carl > >>> > >>> Note that -the parallel environment used under SGE is defined as: > >>> [eg@moe:~]$ qconf -sp round_robin > >>> pe_name round_robin > >>> slots 32 > >>> user_lists NONE > >>> xuser_lists NONE > >>> start_proc_args /bin/true > >>> stop_proc_args /bin/true > >>> allocation_rule $round_robin > >>> control_slaves TRUE > >>> job_is_first_task FALSE > >>> urgency_slots min > >>> accounting_summary FALSE > >>> > >>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE > >>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the > >>> parallel computation at a time, using a round-robin method. > >>> > >>> Note that I'm using the '--bynode' option in the orterun command line. > >>> If the behavior I'm observing is simply the consequence of using this > >>> option, please let me know. This would eventually mean that one need > >>> to state that SGE tight- integration has a lower priority on orterun > >>> behavior than the different command line options. > >>> > >>> Any help would be appreciated, > >>> Thanks, > >>> Eloi > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users -- Eloi Gaudry Free Field Technologies Axis Park Louvain-la-Neuve Rue Emile Francqui, 1 B-1435 Mont-Saint Guibert BELGIUM Company Phone: +32 10 487 959 Company Fax: +32 10 454 626