Hi,

I think this is a broader issue in case an MPI library is used in conjunction 
with threads while running inside a queuing system. First: whether your actual 
installation of Open MPI is SGE-aware you can check with:

$ ompi_info | grep grid
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)

Then we can look at the definition of your PE: "allocation_rule    $fill_up". 
This means that SGE will grant you 14 slots in total in any combination on the 
available machines, means 8+4+2 slots allocation is an allowed combination like 
4+4+3+3 and so on. Depending on the SGE-awareness it's a question: will your 
application just start processes on all nodes and completely disregard the 
granted allocation, or as the other extreme does it stays on one and the same 
machine for all started processes? On the master node of the parallel job you 
can issue:

$ ps -e f

(f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is used to reach 
other machines and their requested process count.


Now to the common problem in such a set up:

AFAICS: for now there is no way in the Open MPI + SGE combination to specify 
the number of MPI processes and intended number of threads which are 
automatically read by Open MPI while staying inside the granted slot count and 
allocation. So it seems to be necessary to have the intended number of threads 
being honored by Open MPI too.

Hence specifying e.g. "allocation_rule 8" in such a setup while requesting 32 
processes, would for now start 32 processes by MPI already, as Open MP reads 
the $PE_HOSTFILE and acts accordingly.

Open MPI would have to read the generated machine file in a slightly different 
way regarding threads: a) read the $PE_HOSTFILE, b) divide the granted slots 
per machine by OMP_NUM_THREADS, c) throw an error in case it's not divisible by 
OMP_NUM_THREADS. Then start one process per quotient.

Would this work for you?

-- Reuti

PS: This would also mean to have a couple of PEs in SGE having a fixed 
"allocation_rule". While this works right now, an extension in SGE could be 
"$fill_up_omp"/"$round_robin_omp" and using  OMP_NUM_THREADS there too, hence 
it must not be specified as an `export` in the job script but either on the 
command line or inside the job script in #$ lines as job requests. This would 
mean to collect slots in bunches of OMP_NUM_THREADS on each machine to reach 
the overall specified slot count. Whether OMP_NUM_THREADS or n times 
OMP_NUM_THREADS is allowed per machine needs to be discussed.
 
PS2: As Univa SGE can also supply a list of granted cores in the $PE_HOSTFILE, 
it would be an extension to feed this to Open MPI to allow any UGE aware 
binding.


Am 14.08.2014 um 21:52 schrieb Oscar Mojica:

> Guys
> 
> I changed the line to run the program in the script with both options
> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none -np $NSLOTS 
> ./inverse.exe
> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket -np $NSLOTS 
> ./inverse.exe
> 
> but I got the same results. When I use man mpirun appears:
> 
>        -bind-to-none, --bind-to-none
>               Do not bind processes.  (Default.)
> 
> and the output of 'qconf -sp orte' is
> 
> pe_name            orte
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> I don't know if the installed Open MPI was compiled with '--with-sge'. How 
> can i know that?
> before to think in an hybrid application i was using only MPI and the program 
> used few processors (14). The cluster possesses 28 machines, 15 with 16 cores 
> and 13 with 8 cores totalizing 344 units of processing. When I submitted the 
> job (only MPI), the MPI processes were spread to the cores directly, for that 
> reason I created a new queue with 14 machines trying to gain more time.  the 
> results were the same in both cases. In the last case i could prove that the 
> processes were distributed to all machines correctly.
> 
> What I must to do?
> Thanks 
> 
> Oscar Fabian Mojica Ladino
> Geologist M.S. in  Geophysics
> 
> 
> > Date: Thu, 14 Aug 2014 10:10:17 -0400
> > From: maxime.boissonnea...@calculquebec.ca
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > 
> > Hi,
> > You DEFINITELY need to disable OpenMPI's new default binding. Otherwise, 
> > your N threads will run on a single core. --bind-to socket would be my 
> > recommendation for hybrid jobs.
> > 
> > Maxime
> > 
> > 
> > Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a écrit :
> > > I don't know much about OpenMP, but do you need to disable Open MPI's 
> > > default bind-to-core functionality (I'm assuming you're using Open MPI 
> > > 1.8.x)?
> > >
> > > You can try "mpirun --bind-to none ...", which will have Open MPI not 
> > > bind MPI processes to cores, which might allow OpenMP to think that it 
> > > can use all the cores, and therefore it will spawn num_cores threads...?
> > >
> > >
> > > On Aug 14, 2014, at 9:50 AM, Oscar Mojica <o_moji...@hotmail.com> wrote:
> > >
> > >> Hello everybody
> > >>
> > >> I am trying to run a hybrid mpi + openmp program in a cluster. I created 
> > >> a queue with 14 machines, each one with 16 cores. The program divides 
> > >> the work among the 14 processors with MPI and within each processor a 
> > >> loop is also divided into 8 threads for example, using openmp. The 
> > >> problem is that when I submit the job to the queue the MPI processes 
> > >> don't divide the work into threads and the program prints the number of 
> > >> threads that are working within each process as one.
> > >>
> > >> I made a simple test program that uses openmp and I logged in one 
> > >> machine of the fourteen. I compiled it using gfortran -fopenmp program.f 
> > >> -o exe, set the OMP_NUM_THREADS environment variable equal to 8 and when 
> > >> I ran directly in the terminal the loop was effectively divided among 
> > >> the cores and for example in this case the program printed the number of 
> > >> threads equal to 8
> > >>
> > >> This is my Makefile
> > >> 
> > >> # Start of the makefile
> > >> # Defining variables
> > >> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o
> > >> #f90comp = /opt/openmpi/bin/mpif90
> > >> f90comp = /usr/bin/mpif90
> > >> #switch = -O3
> > >> executable = inverse.exe
> > >> # Makefile
> > >> all : $(executable)
> > >> $(executable) : $(objects)       
> > >> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
> > >> rm $(objects)
> > >> %.o: %.f
> > >> $(f90comp) -c $<
> > >> # Cleaning everything
> > >> clean:
> > >> rm $(executable)
> > >> #        rm $(objects)
> > >> # End of the makefile
> > >>
> > >> and the script that i am using is
> > >>
> > >> #!/bin/bash
> > >> #$ -cwd
> > >> #$ -j y
> > >> #$ -S /bin/bash
> > >> #$ -pe orte 14
> > >> #$ -N job
> > >> #$ -q new.q
> > >>
> > >> export OMP_NUM_THREADS=8
> > >> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np $NSLOTS 
> > >> ./inverse.exe
> > >>
> > >> am I forgetting something?
> > >>
> > >> Thanks,
> > >>
> > >> Oscar Fabian Mojica Ladino
> > >> Geologist M.S. in Geophysics
> > >> _______________________________________________
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >> Link to this post: 
> > >> http://www.open-mpi.org/community/lists/users/2014/08/25016.php
> > >
> > 
> > 
> > -- 
> > ---------------------------------
> > Maxime Boissonneault
> > Analyste de calcul - Calcul Québec, Université Laval
> > Ph. D. en physique
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/08/25020.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25032.php

Reply via email to