Dear Gerhard, Thank you very much for your detailed reply. It is very clear. I am very grateful for your kind help.
have a nice day. Kind regards, Qiwen ------Original Message------ From:"Gerhard Fecher"<fecher at uni-mainz.de> To:"A Mailing list for WIEN2k users"<wien at zeus.theochem.tuwien.ac.at> Cc: Subject:Re: [Wien] OMP_NUM_THREADS Date:07/27/2011 07:24:19 PM(+0000) >Here a simpoified answer when using a single processor multiple core machine. > >The Hyperthreading of the processor has nothing to do with the threads that >MKL uses, >MKL is (internally) parallelized, that is it can execute certain loops in >parallel, setting OMP_NUM_THREADS >means that you tell MKL how many processor cores to use for the parallel >handling. > >Hyperthreading means that each core can act as two one performing integer and >one floating point operations, >but thats usually not what you have when solving numerical things in Wien. >Now if you tell MKL to use 8 cores for floating point operations but you have >only 4 floating point units (because you have only 4 cores) >the hyperthreading will distribute the work most probably not in the way that >speed the things up. >(In some cases hyperthreading helps also if you have a lot of disk or other >operations (playing games during calculations?), that is if the processor can >run the floating point operations and uses the integer unit for other stuff.) > >If you use k-parallel, then you have principally something like the MKL, just >a level higher. > >The behavior will depend how the memory is used, if the mkl can keep the >intermediate results in the processor (registers, 1st level cache) >then using more then one thread is oftenly faster than using the >k-parallelisation because the data are held in the cache. > >With a quadcore you have three choices >1) 4 k-point processes (4 times "1:localhost") >2) 2k point processes (2 times "1:localhost") and OMP_NUM_THREADS=2 >3) OMP_NUM_THREADS=4 >but there is no definit answer what will be faster more k-points parallel or >more MKL threads, it will depend when, and how many data are just >needed by the process and what is the best use of the cache. > >Actually I found for my purpose that just OMP_NUM_THREADS=2 and running two >Wien calculations >in parallel is the fastest, as I am not an input machine and like to drink a >lot of coffee. > >On the older dual cores it was never a good choice to use hyperthreading and >OMP_NUM_THREADS together, >what happened was that all the work was still done on a single core whereas >the other did rather nothing >and even Intel told to switch off hyperthreading when using OMP_NUM_THREADS (I >tried it some years ago with Wien on a 2 processor dual core Xenon machine >and found that Intel is right). >I don't know whether this behavior to distribute the processes is meanwhile >changed by hard- or software management. > >Ciao >Gerhard > >==================================== >Dr. Gerhard H. Fecher >Institut of Inorganic and Analytical Chemistry >Johannes Gutenberg - University >55099 Mainz >________________________________________ >Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at >zeus.theochem.tuwien.ac.at]" im Auftrag von "Dr Qiwen YAO >[Yao.Qiwen at nims.go.jp] >Gesendet: Mittwoch, 27. Juli 2011 17:30 >Bis: A Mailing list for WIEN2k users >Betreff: Re: [Wien] OMP_NUM_THREADS > >Dear Gerhard, >Thank you very much for your respond. >I am a bit slow in catching up with what you are saying, may I rephrase what >you've suggested and see is I could understand what you are suggesting: > >For 4 k-points and 4 mkl thread - do you mean I would set 4 lines of >"1:localhost" in the .machines file and set OMP_NUM_THREADS=4? > >And for the 2 k points and 2 mkl threads - do I set only 2 lines of >"1:localhost" in the .machines file and set OMP_NUM_THREADS=2? > >If I am understanding you correctly, I will try both scenario and see which >one is more efficient. > >Thank you so much for your time and help! > >Qiwen > > >------Original Message------ >From:"Gerhard Fecher"<fecher at uni-mainz.de> >To:"A Mailing list for WIEN2k users"<wien at zeus.theochem.tuwien.ac.at> >Cc: >Subject:Re: [Wien] OMP_NUM_THREADS >Date:07/27/2011 03:17:21 PM(+0000) >>If you have four "real" cores you may run in parallel either 4 k-points or 4 >>mkl threads or 2 k points and 2 mkl threads >> >>In some cases it might be good to "switch off the virtual cores" in the bios, >>at least with older processors/compilers this was faster, >>but I did not check anymore. >> >>Ciao >>Gerhard >> >>==================================== >>Dr. Gerhard H. Fecher >>Institut of Inorganic and Analytical Chemistry >>Johannes Gutenberg - University >>55099 Mainz >>________________________________________ >>Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at >>zeus.theochem.tuwien.ac.at]" im Auftrag von "Dr Qiwen YAO >>[Yao.Qiwen at nims.go.jp] >>Gesendet: Mittwoch, 27. Juli 2011 14:13 >>Bis: A Mailing list for WIEN2k users >>Betreff: [Wien] OMP_NUM_THREADS >> >>Dear Wien2k users, >> >>We were told in the WIEN workshop that for mkl+multi-core cases, it might be >>better having a setting of $OMP_NUM_THREADS =2. >> >>I have two questions in my mind: >> >>Q1. Does this apply to a 2 core system with 4GB RAM that is not running >>parallel calculation (not K-point parallel nor mpi-parallel )? >> >> >>Q2. Or this only apply to eg a quad core machine that runs on k-point >>parallel or mpi-parallel calculation? >> >>I have a 4-Core Dell T7500 PC with 12GB RAM, each core is of two threads, so >>in Susie/Linux or even in Windows, it all displays as a 8 CPU machine (so it >>is in actuality a four-core CPU but each core is with 2 threads, so all the >>OS sees it as a 8-core CPU). The actual info for this CPU is here if you like >>to see the details of it: http://ark.intel.com/products/37111 >> >>I am setting up this machine running k-parallel calculation (not mpi-parallel >>as I have got only one of this machine for the moment), I am pondering: >> >>Which of the following 2 scenarios is a better choice for a 90 atom supercell >>calculation? >> >>Scenario 1. >>.machines files is this: >>------- >>granularity:1 >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>extrafine:1 >>---------- >>and the OMP_NUM_THREADS=1 as default in my .bashrc file. >>so no multi-threading but all k-parallelism. (With this setting, I do notice >>after running the job for while - more than an hour say, the 8 CPUs shown in >>the System Monitor says only two CPUs were really utilized at a time (and it >>keep switching CPUs for the full-loading status, but mostly only two fully >>loaded at a time) and the rest of the 6 CPUs weren't really doing much - some >>of a few percentage of the load and others even on 0% - so I was wondering >>maybe this setting isn't optimized? >> >>Scenario 2. >>.machines files would be like this: >>------- >>granularity:1 >>1:localhost >>1:localhost >>1:localhost >>1:localhost >>extrafine:1 >>---------- >>and set the OMP_NUM_THREADS=2 in my .bashrc file - I have not tried this >>setting as I am not sure if this would be a workable setting? >> >>Or, both settings would work and won't make much difference in calculation >>time length for a supercell calculation of 90 atoms? I am new to WIEN so I >>could not fully understand the THREAD'ings in WIEN's aspect. >> >>On addition, for the above two .machines file setting, would it make any >>difference if I put the real hostname in the place of "localhost"? >> >>Any comment would be greatly appreciated. >> >>Thank you! >> >>Kind regards, >>Qiwen >> >>********************************************************** >> >>Dr QiWen YAO >> >>JSPS Fellow >>Multifunctional Materials Group >>Optical and Electronic Materials Unit >>Environment and Energy Materials Research Division >> >>National Institute for Materials Science >> >>1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan >>Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501 >> >>********************************************************** >> >>_______________________________________________ >>Wien mailing list >>Wien at zeus.theochem.tuwien.ac.at >>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >>_______________________________________________ >>Wien mailing list >>Wien at zeus.theochem.tuwien.ac.at >>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > >********************************************************** > >Dr QiWen YAO > >JSPS Fellow >Multifunctional Materials Group >Optical and Electronic Materials Unit >Environment and Energy Materials Research Division > >National Institute for Materials Science > >1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan >Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501 > >********************************************************** > >_______________________________________________ >Wien mailing list >Wien at zeus.theochem.tuwien.ac.at >http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >_______________________________________________ >Wien mailing list >Wien at zeus.theochem.tuwien.ac.at >http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ********************************************************** Dr QiWen YAO JSPS Fellow Multifunctional Materials Group Optical and Electronic Materials Unit Environment and Energy Materials Research Division National Institute for Materials Science 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501 **********************************************************