Am 20.08.2014 um 16:26 schrieb Ralph Castain: > On Aug 20, 2014, at 6:58 AM, Reuti <re...@staff.uni-marburg.de> wrote: > >> Hi, >> >> Am 20.08.2014 um 13:26 schrieb tmish...@jcity.maeda.co.jp: >> >>> Reuti, >>> >>> If you want to allocate 10 procs with N threads, the Torque >>> script below should work for you: >>> >>> qsub -l nodes=10:ppn=N >>> mpirun -map-by slot:pe=N -np 10 -x OMP_NUM_THREADS=N ./inverse.exe >> >> I played around with giving -np 10 in addition to a Tight Integration. The >> slot count is not really divided I think, but only 10 out of the granted >> maximum is used (while on each of the listed machines an `orted` is >> started). Due to the fixed allocation this is of course the result we want >> to achieve as it subtracts bunches of 8 from the given list of machines >> resp. slots. In SGE it's sufficient to use and AFAICS it works (without >> touching the $PE_HOSTFILE any longer): >> >> === >> export OMP_NUM_THREADS=8 >> mpirun -map-by slot:pe=$OMP_NUM_THREADS -np $(bc <<<"$NSLOTS / >> $OMP_NUM_THREADS") ./inverse.exe >> === >> >> and submit with: >> >> $ qsub -pe orte 80 job.sh >> >> as the variables are distributed to the slave nodes by SGE already. >> >> Nevertheless, using -np in addition to the Tight Integration gives a taste >> of a kind of half-tight integration in some way. And would not work for us >> because "--bind-to none" can't be used in such a command (see below) and >> throws an error. >> >> >>> Then, the openmpi automatically reduces the logical slot count to 10 >>> by dividing real slot count 10N by binding width of N. >>> >>> I don't know why you want to use pe=N without binding, but unfortunately >>> the openmpi allocates successive cores to each process so far when you >>> use pe option - it forcibly bind_to core. >> >> In a shared cluster with many users and different MPI libraries in use, only >> the queuingsystem could know which job got which cores granted. This avoids >> any oversubscription of cores, while others are idle. > > FWIW: we detect the exterior binding constraint and work within it
Aha, this is quite interesting - how do you do this: scanning the /proc/<pid>/status or alike? What happens if you don't find enough free cores as they are used up by other applications already? -- Reuti >> -- Reuti >> >> >>> Tetsuya >>> >>> >>>> Hi, >>>> >>>> Am 20.08.2014 um 06:26 schrieb Tetsuya Mishima: >>>> >>>>> Reuti and Oscar, >>>>> >>>>> I'm a Torque user and I myself have never used SGE, so I hesitated to >>> join >>>>> the discussion. >>>>> >>>>> From my experience with the Torque, the openmpi 1.8 series has already >>>>> resolved the issue you pointed out in combining MPI with OpenMP. >>>>> >>>>> Please try to add --map-by slot:pe=8 option, if you want to use 8 >>> threads. >>>>> Then, then openmpi 1.8 should allocate processes properly without any >>> modification >>>>> of the hostfile provided by the Torque. >>>>> >>>>> In your case(8 threads and 10 procs): >>>>> >>>>> # you have to request 80 slots using SGE command before mpirun >>>>> mpirun --map-by slot:pe=8 -np 10 ./inverse.exe >>>> >>>> Thx for pointing me to this option, for now I can't get it working though >>> (in fact, I want to use it without binding essentially). This allows to >>> tell Open MPI to bind more cores to each of the MPI >>>> processes - ok, but does it lower the slot count granted by Torque too? I >>> mean, was your submission command like: >>>> >>>> $ qsub -l nodes=10:ppn=8 ... >>>> >>>> so that Torque knows, that it should grant and remember this slot count >>> of a total of 80 for the correct accounting? >>>> >>>> -- Reuti >>>> >>>> >>>>> where you can omit --bind-to option because --bind-to core is assumed >>>>> as default when pe=N is provided by the user. >>>>> Regards, >>>>> Tetsuya >>>>> >>>>>> Hi, >>>>>> >>>>>> Am 19.08.2014 um 19:06 schrieb Oscar Mojica: >>>>>> >>>>>>> I discovered what was the error. I forgot include the '-fopenmp' when >>> I compiled the objects in the Makefile, so the program worked but it didn't >>> divide the job >>>>> in threads. Now the program is working and I can use until 15 cores for >>> machine in the queue one.q. >>>>>>> >>>>>>> Anyway i would like to try implement your advice. Well I'm not alone >>> in the cluster so i must implement your second suggestion. The steps are >>>>>>> >>>>>>> a) Use '$ qconf -mp orte' to change the allocation rule to 8 >>>>>> >>>>>> The number of slots defined in your used one.q was also increased to 8 >>> (`qconf -sq one.q`)? >>>>>> >>>>>> >>>>>>> b) Set '#$ -pe orte 80' in the script >>>>>> >>>>>> Fine. >>>>>> >>>>>> >>>>>>> c) I'm not sure how to do this step. I'd appreciate your help here. I >>> can add some lines to the script to determine the PE_HOSTFILE path and >>> contents, but i >>>>> don't know how alter it >>>>>> >>>>>> For now you can put in your jobscript (just after OMP_NUM_THREAD is >>> exported): >>>>>> >>>>>> awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads; >>> print }' $PE_HOSTFILE > $TMPDIR/machines >>>>>> export PE_HOSTFILE=$TMPDIR/machines >>>>>> >>>>>> ============= >>>>>> >>>>>> Unfortunately noone stepped into this discussion, as in my opinion >>> it's a much broader issue which targets all users who want to combine MPI >>> with OpenMP. The >>>>> queuingsystem should get a proper request for the overall amount of >>> slots the user needs. For now this will be forwarded to Open MPI and it >>> will use this >>>>> information to start the appropriate number of processes (which was an >>> achievement for the Tight Integration out-of-the-box of course) and ignores >>> any setting of >>>>> OMP_NUM_THREADS. So, where should the generated list of machines be >>> adjusted; there are several options: >>>>>> >>>>>> a) The PE of the queuingsystem should do it: >>>>>> >>>>>> + a one time setup for the admin >>>>>> + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE >>>>>> - the "start_proc_args" would need to know the number of threads, i.e. >>> OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript >>> (tricky scanning >>>>> of the submitted jobscript for OMP_NUM_THREADS would be too nasty) >>>>>> - limits to use inside the jobscript calls to libraries behaving in >>> the same way as Open MPI only >>>>>> >>>>>> >>>>>> b) The particular queue should do it in a queue prolog: >>>>>> >>>>>> same as a) I think >>>>>> >>>>>> >>>>>> c) The user should do it >>>>>> >>>>>> + no change in the SGE installation >>>>>> - each and every user must include it in all the jobscripts to adjust >>> the list and export the pointer to the $PE_HOSTFILE, but he could change it >>> forth and back >>>>> for different steps of the jobscript though >>>>>> >>>>>> >>>>>> d) Open MPI should do it >>>>>> >>>>>> + no change in the SGE installation >>>>>> + no change to the jobscript >>>>>> + OMP_NUM_THREADS can be altered for different steps of the jobscript >>> while staying inside the granted allocation automatically >>>>>> o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS >>> already)? >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> echo "PE_HOSTFILE:" >>>>>>> echo $PE_HOSTFILE >>>>>>> echo >>>>>>> echo "cat PE_HOSTFILE:" >>>>>>> cat $PE_HOSTFILE >>>>>>> >>>>>>> Thanks for take a time for answer this emails, your advices had been >>> very useful >>>>>>> >>>>>>> PS: The version of SGE is OGS/GE 2011.11p1 >>>>>>> >>>>>>> >>>>>>> Oscar Fabian Mojica Ladino >>>>>>> Geologist M.S. in Geophysics >>>>>>> >>>>>>> >>>>>>>> From: re...@staff.uni-marburg.de >>>>>>>> Date: Fri, 15 Aug 2014 20:38:12 +0200 >>>>>>>> To: us...@open-mpi.org >>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Am 15.08.2014 um 19:56 schrieb Oscar Mojica: >>>>>>>> >>>>>>>>> Yes, my installation of Open MPI is SGE-aware. I got the following >>>>>>>>> >>>>>>>>> [oscar@compute-1-2 ~]$ ompi_info | grep grid >>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2) >>>>>>>> >>>>>>>> Fine. >>>>>>>> >>>>>>>> >>>>>>>>> I'm a bit slow and I didn't understand the las part of your >>> message. So i made a test trying to solve my doubts. >>>>>>>>> This is the cluster configuration: There are some machines turned >>> off but that is no problem >>>>>>>>> >>>>>>>>> [oscar@aguia free-noise]$ qhost >>>>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS >>>>>>>>> >>> ------------------------------------------------------------------------------- >>> >>>>>>>>> global - - - - - - - >>>>>>>>> compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0 >>>>>>>>> compute-1-11 linux-x64 16 - 23.6G - 996.2M - >>>>>>>>> compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0 >>>>>>>>> compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0 >>>>>>>>> compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0 >>>>>>>>> compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0 >>>>>>>>> compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0 >>>>>>>>> compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0 >>>>>>>>> compute-1-18 linux-x64 8 - 15.7G - 1000.0M - >>>>>>>>> compute-1-19 linux-x64 8 - 15.7G - 996.2M - >>>>>>>>> compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0 >>>>>>>>> compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0 >>>>>>>>> compute-1-21 linux-x64 8 - 15.7G - 1000.0M - >>>>>>>>> compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0 >>>>>>>>> compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0 >>>>>>>>> compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0 >>>>>>>>> compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0 >>>>>>>>> compute-1-26 linux-x64 8 - 15.7G - 1000.0M - >>>>>>>>> compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0 >>>>>>>>> compute-1-29 linux-x64 8 - 15.7G - 1000.0M - >>>>>>>>> compute-1-3 linux-x64 16 - 23.6G - 996.2M - >>>>>>>>> compute-1-30 linux-x64 16 - 23.6G - 996.2M - >>>>>>>>> compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0 >>>>>>>>> compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0 >>>>>>>>> compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0 >>>>>>>>> compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0 >>>>>>>>> compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0 >>>>>>>>> compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0 >>>>>>>>> >>>>>>>>> I ran my program using only MPI with 10 processors of the queue >>> one.q which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t' >>> I got: >>>>>>>>> >>>>>>>>> [oscar@aguia free-noise]$ qstat -t >>>>>>>>> job-ID prior name user state submit/start at queue master >>> ja-task-ID task-ID state cpu mem io stat failed >>>>>>>>> >>>>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>>>> ---- >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-2.local MASTER r 00:49:12 554.13753 0.09163 >>>>>>>>> one.q@compute-1-2.local SLAVE >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-5.local SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-9.local SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-12.local SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-13.local SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-14.local SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-10.local SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-15.local SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-8.local SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349 >>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 >>> one.q@compute-1-4.local SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967 >>>>>>>> >>>>>>>> Yes, here you got 10 slots (= cores) granted by SGE. So there is no >>> free core left inside the allocation of SGE to allow the use of additional >>> cores for your >>>>> threads. If you use more cores than granted by SGE, it will >>> oversubscribe the machines. >>>>>>>> >>>>>>>> The issue is now: >>>>>>>> >>>>>>>> a) If you want 8 threads per MPI process, your job will use 80 cores >>> in total - for now SGE isn't aware of it. >>>>>>>> >>>>>>>> b) Although you specified $fill_up as allocation rule, it looks like >>> $round_robin. Is there more than one slot defined in the queue definition >>> of one.q to get >>>>> exclusive access? >>>>>>>> >>>>>>>> c) What version of SGE are you using? Certain ones use cgroups or >>> bind processes directly to cores (although it usually needs to be requested >>> by the job: >>>>> first line of `qconf -help`). >>>>>>>> >>>>>>>> >>>>>>>> In case you are alone in the cluster, you could bypass the >>> allocation with b) (unless you are hit by c)). But having a mixture of >>> users and jobs a different >>>>> handling would be necessary to handle this in a proper way IMO: >>>>>>>> >>>>>>>> a) having a PE with a fixed allocation rule of 8 >>>>>>>> >>>>>>>> b) requesting this PE with an overall slot count of 80 >>>>>>>> >>>>>>>> c) copy and alter the $PE_HOSTFILE to show only (granted core count >>> per machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so >>> that it points >>>>> to the altered file >>>>>>>> >>>>>>>> d) Open MPI with a Tight Integration will now start only N process >>> per machine according to the altered hostfile, in your case one >>>>>>>> >>>>>>>> e) Your application can start the desired threads and you stay >>> inside the granted allocation >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>> >>>>>>>>> I accessed to the MASTER processor with 'ssh compute-1-2.local' , >>> and with $ ps -e f and got this, I'm showing only the last lines >>>>>>>>> >>>>>>>>> 2506 ? Ss 0:00 /usr/sbin/atd >>>>>>>>> 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1 >>>>>>>>> 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2 >>>>>>>>> 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3 >>>>>>>>> 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4 >>>>>>>>> 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5 >>>>>>>>> 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6 >>>>>>>>> 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd >>>>>>>>> 17688 ? S 0:00 \_ sge_shepherd-2726 -bg >>>>>>>>> 17695 ? Ss 0:00 \_ >>> -bash /opt/gridengine/default/spool/compute-1-2/job_scripts/2726 >>>>>>>>> 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v >>> -np 10 ./inverse.exe >>>>>>>>> 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe >>>>>>>>> 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo >>>>>>>>> 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo >>>>>>>>> 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp >>>>>>>>> 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp >>>>>>>>> 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp >>>>>>>>> 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp >>>>>>>>> 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp >>>>>>>>> 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo >>>>>>>>> 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit >>> -nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo >>>>>>>>> 17826 ? R 31:36 \_ ./inverse.exe >>>>>>>>> 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid >>>>>>>>> >>>>>>>>> So the job is using the 10 machines, Until here is all right OK. Do >>> you think that changing the "allocation_rule " to a number instead $fill_up >>> the MPI >>>>> processes would divide the work in that number of threads? >>>>>>>>> >>>>>>>>> Thanks a lot >>>>>>>>> >>>>>>>>> Oscar Fabian Mojica Ladino >>>>>>>>> Geologist M.S. in Geophysics >>>>>>>>> >>>>>>>>> >>>>>>>>> PS: I have another doubt, what is a slot? is a physical core? >>>>>>>>> >>>>>>>>> >>>>>>>>>> From: re...@staff.uni-marburg.de >>>>>>>>>> Date: Thu, 14 Aug 2014 23:54:22 +0200 >>>>>>>>>> To: us...@open-mpi.org >>>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I think this is a broader issue in case an MPI library is used in >>> conjunction with threads while running inside a queuing system. First: >>> whether your >>>>> actual installation of Open MPI is SGE-aware you can check with: >>>>>>>>>> >>>>>>>>>> $ ompi_info | grep grid >>>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) >>>>>>>>>> >>>>>>>>>> Then we can look at the definition of your PE: "allocation_rule >>> $fill_up". This means that SGE will grant you 14 slots in total in any >>> combination on the >>>>> available machines, means 8+4+2 slots allocation is an allowed >>> combination like 4+4+3+3 and so on. Depending on the SGE-awareness it's a >>> question: will your >>>>> application just start processes on all nodes and completely disregard >>> the granted allocation, or as the other extreme does it stays on one and >>> the same machine >>>>> for all started processes? On the master node of the parallel job you >>> can issue: >>>>>>>>>> >>>>>>>>>> $ ps -e f >>>>>>>>>> >>>>>>>>>> (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is >>> used to reach other machines and their requested process count. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Now to the common problem in such a set up: >>>>>>>>>> >>>>>>>>>> AFAICS: for now there is no way in the Open MPI + SGE combination >>> to specify the number of MPI processes and intended number of threads which >>> are >>>>> automatically read by Open MPI while staying inside the granted slot >>> count and allocation. So it seems to be necessary to have the intended >>> number of threads being >>>>> honored by Open MPI too. >>>>>>>>>> >>>>>>>>>> Hence specifying e.g. "allocation_rule 8" in such a setup while >>> requesting 32 processes, would for now start 32 processes by MPI already, >>> as Open MP reads > the $PE_HOSTFILE and acts accordingly. >>>>>>>>>> >>>>>>>>>> Open MPI would have to read the generated machine file in a >>> slightly different way regarding threads: a) read the $PE_HOSTFILE, b) >>> divide the granted >>>>> slots per machine by OMP_NUM_THREADS, c) throw an error in case it's >>> not divisible by OMP_NUM_THREADS. Then start one process per quotient. >>>>>>>>>> >>>>>>>>>> Would this work for you? >>>>>>>>>> >>>>>>>>>> -- Reuti >>>>>>>>>> >>>>>>>>>> PS: This would also mean to have a couple of PEs in SGE having a >>> fixed "allocation_rule". While this works right now, an extension in SGE >>> could be >>>>> "$fill_up_omp"/"$round_robin_omp" and using OMP_NUM_THREADS there too, >>> hence it must not be specified as an `export` in the job script but either >>> on the command >>>>> line or inside the job script in #$ lines as job requests. This would >>> mean to collect slots in bunches of OMP_NUM_THREADS on each machine to >>> reach the overall >>>>> specified slot count. Whether OMP_NUM_THREADS or n times >>> OMP_NUM_THREADS is allowed per machine needs to be discussed. >>>>>>>>>> >>>>>>>>>> PS2: As Univa SGE can also supply a list of granted cores in the >>> $PE_HOSTFILE, it would be an extension to feed this to Open MPI to allow >>> any UGE aware >>>>> binding. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 14.08.2014 um 21:52 schrieb Oscar Mojica: >>>>>>>>>> >>>>>>>>>>> Guys >>>>>>>>>>> >>>>>>>>>>> I changed the line to run the program in the script with both >>> options >>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none >>> -np $NSLOTS ./inverse.exe >>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket >>> -np $NSLOTS ./inverse.exe >>>>>>>>>>> >>>>>>>>>>> but I got the same results. When I use man mpirun appears: >>>>>>>>>>> >>>>>>>>>>> -bind-to-none, --bind-to-none >>>>>>>>>>> Do not bind processes. (Default.) >>>>>>>>>>> >>>>>>>>>>> and the output of 'qconf -sp orte' is >>>>>>>>>>> >>>>>>>>>>> pe_name orte >>>>>>>>>>> slots 9999 >>>>>>>>>>> user_lists NONE >>>>>>>>>>> xuser_lists NONE >>>>>>>>>>> start_proc_args /bin/true >>>>>>>>>>> stop_proc_args /bin/true >>>>>>>>>>> allocation_rule $fill_up >>>>>>>>>>> control_slaves TRUE >>>>>>>>>>> job_is_first_task FALSE >>>>>>>>>>> urgency_slots min >>>>>>>>>>> accounting_summary TRUE >>>>>>>>>>> >>>>>>>>>>> I don't know if the installed Open MPI was compiled with >>> '--with-sge'. How can i know that? >>>>>>>>>>> before to think in an hybrid application i was using only MPI and >>> the program used few processors (14). The cluster possesses 28 machines, 15 >>> with 16 >>>>> cores and 13 with 8 cores totalizing 344 units of processing. When I >>> submitted the job (only MPI), the MPI processes were spread to the cores >>> directly, for that >>>>> reason I created a new queue with 14 machines trying to gain more time. >>> the results were the same in both cases. In the last case i could prove >>> that the processes >>>>> were distributed to all machines correctly. >>>>>>>>>>> >>>>>>>>>>> What I must to do? >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Oscar Fabian Mojica Ladino >>>>>>>>>>> Geologist M.S. in Geophysics >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Date: Thu, 14 Aug 2014 10:10:17 -0400 >>>>>>>>>>>> From: maxime.boissonnea...@calculquebec.ca >>>>>>>>>>>> To: us...@open-mpi.org >>>>>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> You DEFINITELY need to disable OpenMPI's new default binding. >>> Otherwise, >>>>>>>>>>>> your N threads will run on a single core. --bind-to socket would >>> be my >>>>>>>>>>>> recommendation for hybrid jobs. >>>>>>>>>>>> >>>>>>>>>>>> Maxime >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a 馗rit : >>>>>>>>>>>>> I don't know much about OpenMP, but do you need to disable Open >>> MPI's default bind-to-core functionality (I'm assuming you're using Open >>> MPI 1.8.x)? >>>>>>>>>>>>> >>>>>>>>>>>>> You can try "mpirun --bind-to none ...", which will have Open >>> MPI not bind MPI processes to cores, which might allow OpenMP to think that >>> it can use >>>>> all the cores, and therefore it will spawn num_cores threads...? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 14, 2014, at 9:50 AM, Oscar Mojica >>> <o_moji...@hotmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello everybody >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am trying to run a hybrid mpi + openmp program in a cluster. >>> I created a queue with 14 machines, each one with 16 cores. The program >>> divides the >>>>> work among the 14 processors with MPI and within each processor a loop >>> is also divided into 8 threads for example, using openmp. The problem is >>> that when I submit >>>>> the job to the queue the MPI processes don't divide the work into >>> threads and the program prints the number of threads that are working >>> within each process as one. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I made a simple test program that uses openmp and I logged in >>> one machine of the fourteen. I compiled it using gfortran -fopenmp >>> program.f -o exe, >>>>> set the OMP_NUM_THREADS environment variable equal to 8 and when I ran >>> directly in the terminal the loop was effectively divided among the cores >>> and for example in >>>>> this case the program printed the number of threads equal to 8 >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is my Makefile >>>>>>>>>>>>>> >>>>>>>>>>>>>> # Start of the makefile >>>>>>>>>>>>>> # Defining variables >>>>>>>>>>>>>> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o >>>>>>>>>>>>>> #f90comp = /opt/openmpi/bin/mpif90 >>>>>>>>>>>>>> f90comp = /usr/bin/mpif90 >>>>>>>>>>>>>> #switch = -O3 >>>>>>>>>>>>>> executable = inverse.exe >>>>>>>>>>>>>> # Makefile >>>>>>>>>>>>>> all : $(executable) >>>>>>>>>>>>>> $(executable) : $(objects) >>>>>>>>>>>>>> $(f90comp) -fopenmp -g -O -o $(executable) $(objects) >>>>>>>>>>>>>> rm $(objects) >>>>>>>>>>>>>> %.o: %.f >>>>>>>>>>>>>> $(f90comp) -c $< >>>>>>>>>>>>>> # Cleaning everything >>>>>>>>>>>>>> clean: >>>>>>>>>>>>>> rm $(executable) >>>>>>>>>>>>>> # rm $(objects) >>>>>>>>>>>>>> # End of the makefile >>>>>>>>>>>>>> >>>>>>>>>>>>>> and the script that i am using is >>>>>>>>>>>>>> >>>>>>>>>>>>>> #!/bin/bash >>>>>>>>>>>>>> #$ -cwd >>>>>>>>>>>>>> #$ -j y >>>>>>>>>>>>>> #$ -S /bin/bash >>>>>>>>>>>>>> #$ -pe orte 14 >>>>>>>>>>>>>> #$ -N job >>>>>>>>>>>>>> #$ -q new.q >>>>>>>>>>>>>> >>>>>>>>>>>>>> export OMP_NUM_THREADS=8 >>>>>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np >>> $NSLOTS ./inverse.exe >>>>>>>>>>>>>> >>>>>>>>>>>>>> am I forgetting something? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Oscar Fabian Mojica Ladino >>>>>>>>>>>>>> Geologist M.S. in Geophysics >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> Subscription: >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25016.php >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> --------------------------------- >>>>>>>>>>>> Maxime Boissonneault >>>>>>>>>>>> Analyste de calcul - Calcul Qu饕ec, Universit・Laval >>>>>>>>>>>> Ph. D. en physique >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25020.php >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25032.php >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25034.php >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25037.php >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25038.php >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25079.php >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25080.php >>>>> >>>>> ---- >>>>> Tetsuya Mishima tmish...@jcity.maeda.co.jp >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25081.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25083.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/08/25084.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/08/25087.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/25088.php