Dear Gerhard,
Thank you very much for your detailed reply. It is very clear. I am very 
grateful for your kind help.

have a nice day.

Kind regards,
Qiwen


------Original Message------
From:"Gerhard Fecher"<fecher at uni-mainz.de>
To:"A Mailing list for WIEN2k users"<wien at zeus.theochem.tuwien.ac.at>
Cc:
Subject:Re: [Wien] OMP_NUM_THREADS
Date:07/27/2011 07:24:19 PM(+0000)
>Here a simpoified answer when using a single processor multiple core machine.
>
>The Hyperthreading of the processor has nothing to do with the threads that 
>MKL uses,
>MKL is (internally) parallelized, that is it can execute certain loops in 
>parallel, setting OMP_NUM_THREADS
>means that you tell MKL how many processor cores to use for the parallel 
>handling.
>
>Hyperthreading means that each core can act as two one performing integer and 
>one floating point operations,
>but thats usually not what you have when solving numerical things in Wien.
>Now if you tell MKL to use 8 cores for floating point operations but you have 
>only 4 floating point units (because you have only 4 cores)
>the hyperthreading will distribute the work most probably not in the way that 
>speed the things up.
>(In some cases hyperthreading helps also if you have a lot of disk or other 
>operations (playing games during calculations?), that is if the processor can 
>run the floating point operations and uses the integer unit for other stuff.)
>
>If you use k-parallel, then you have principally something like the MKL, just 
>a level higher.
>
>The behavior will depend how the memory is used, if the mkl can keep the 
>intermediate results in the processor (registers, 1st level cache)
>then using more then one thread is oftenly faster than using the 
>k-parallelisation because the data are held in the cache.
>
>With a quadcore you have three choices
>1) 4 k-point processes  (4 times "1:localhost")
>2) 2k point processes  (2 times "1:localhost") and OMP_NUM_THREADS=2
>3) OMP_NUM_THREADS=4
>but there is no definit answer what will be faster more k-points parallel or 
>more MKL threads, it will depend when, and how many data are just
>needed by the process and what is the best use of the cache.
>
>Actually I found for my purpose that just OMP_NUM_THREADS=2 and running two 
>Wien calculations
>in parallel is the fastest, as I am not an input machine and like to drink a 
>lot of coffee.
>
>On the older dual cores it was never a good choice to use hyperthreading and 
>OMP_NUM_THREADS together,
>what happened was that all the work was still done on a single core whereas 
>the other did rather nothing
>and even Intel told to switch off hyperthreading when using OMP_NUM_THREADS (I 
>tried it some years ago with Wien on a 2 processor dual core Xenon machine 
>and found that Intel is right).
>I don't know whether this behavior to distribute the processes is meanwhile 
>changed by hard- or software management.
>
>Ciao
>Gerhard
>
>====================================
>Dr. Gerhard H. Fecher
>Institut of Inorganic and Analytical Chemistry
>Johannes Gutenberg - University
>55099 Mainz
>________________________________________
>Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at 
>zeus.theochem.tuwien.ac.at]&quot; im Auftrag von &quot;Dr Qiwen  YAO 
>[Yao.Qiwen at nims.go.jp]
>Gesendet: Mittwoch, 27. Juli 2011 17:30
>Bis: A Mailing list for WIEN2k users
>Betreff: Re: [Wien] OMP_NUM_THREADS
>
>Dear Gerhard,
>Thank you very much for your respond.
>I am a bit slow in catching up with what you are saying, may I rephrase what 
>you've suggested and see is I could understand what you are suggesting:
>
>For 4 k-points and 4 mkl thread - do you mean I would set 4 lines of 
>"1:localhost" in the .machines file and set OMP_NUM_THREADS=4?
>
>And for the 2 k points and 2 mkl threads - do I set only 2 lines of  
>"1:localhost" in the .machines file and set OMP_NUM_THREADS=2?
>
>If I am understanding you correctly, I will try both scenario and see which 
>one is more efficient.
>
>Thank you so much for your time and help!
>
>Qiwen
>
>
>------Original Message------
>From:"Gerhard Fecher"<fecher at uni-mainz.de>
>To:"A Mailing list for WIEN2k users"<wien at zeus.theochem.tuwien.ac.at>
>Cc:
>Subject:Re: [Wien] OMP_NUM_THREADS
>Date:07/27/2011 03:17:21 PM(+0000)
>>If you have four "real" cores you may run in parallel either 4 k-points or 4 
>>mkl threads or 2 k points and 2 mkl threads
>>
>>In some cases it might be good to "switch off the virtual cores" in the bios, 
>>at least with older processors/compilers this was faster,
>>but I did not check anymore.
>>
>>Ciao
>>Gerhard
>>
>>====================================
>>Dr. Gerhard H. Fecher
>>Institut of Inorganic and Analytical Chemistry
>>Johannes Gutenberg - University
>>55099 Mainz
>>________________________________________
>>Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at 
>>zeus.theochem.tuwien.ac.at]&quot; im Auftrag von &quot;Dr Qiwen  YAO 
>>[Yao.Qiwen at nims.go.jp]
>>Gesendet: Mittwoch, 27. Juli 2011 14:13
>>Bis: A Mailing list for WIEN2k users
>>Betreff: [Wien] OMP_NUM_THREADS
>>
>>Dear Wien2k users,
>>
>>We were told in the WIEN workshop that for  mkl+multi-core cases, it might be 
>>better having a setting of $OMP_NUM_THREADS =2.
>>
>>I have two questions in my mind:
>>
>>Q1.  Does this apply to a 2 core system with 4GB RAM that is not running 
>>parallel calculation (not K-point parallel nor mpi-parallel )?
>>
>>
>>Q2. Or this only apply to eg a quad core machine that runs on k-point 
>>parallel or mpi-parallel calculation?
>>
>>I have a 4-Core Dell T7500 PC with 12GB RAM, each core is of two threads, so 
>>in Susie/Linux or even in Windows, it all displays as a 8 CPU machine (so it 
>>is in actuality a four-core CPU but each core is with 2 threads, so all the 
>>OS sees it as a 8-core CPU). The actual info for this CPU is here if you like 
>>to see the details of it: http://ark.intel.com/products/37111
>>
>>I am setting up this machine running k-parallel calculation (not mpi-parallel 
>>as I have got only one of this machine for the moment), I am pondering:
>>
>>Which of the following 2 scenarios is a better choice for a 90 atom supercell 
>>calculation?
>>
>>Scenario  1.
>>.machines files is this:
>>-------
>>granularity:1
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>extrafine:1
>>----------
>>and the OMP_NUM_THREADS=1 as default in my .bashrc file.
>>so no multi-threading but all k-parallelism. (With this setting, I do notice 
>>after running the job for while - more than an hour say, the 8 CPUs shown in 
>>the System Monitor says only two CPUs were really utilized at a time (and it 
>>keep switching CPUs for the full-loading status, but mostly only two fully 
>>loaded at a time) and the rest of the 6 CPUs weren't really doing much - some 
>>of a few percentage of the load and others even on 0% - so I was wondering 
>>maybe this setting isn't optimized?
>>
>>Scenario  2.
>>.machines files would be like this:
>>-------
>>granularity:1
>>1:localhost
>>1:localhost
>>1:localhost
>>1:localhost
>>extrafine:1
>>----------
>>and set the OMP_NUM_THREADS=2 in my .bashrc file - I have not tried this 
>>setting as I am not sure if this would be a workable setting?
>>
>>Or, both settings would work and won't make much difference in calculation 
>>time length for a supercell calculation of 90 atoms? I am new to WIEN so I 
>>could not fully understand the THREAD'ings in WIEN's aspect.
>>
>>On addition, for the above two .machines file setting, would it make any 
>>difference if I put the real hostname in the place of "localhost"?
>>
>>Any comment would be greatly appreciated.
>>
>>Thank you!
>>
>>Kind regards,
>>Qiwen
>>
>>**********************************************************
>>
>>Dr QiWen YAO
>>
>>JSPS Fellow
>>Multifunctional Materials Group
>>Optical and Electronic Materials Unit
>>Environment and Energy Materials Research Division
>>
>>National Institute for Materials Science
>>
>>1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
>>Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501
>>
>>**********************************************************
>>
>>_______________________________________________
>>Wien mailing list
>>Wien at zeus.theochem.tuwien.ac.at
>>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>_______________________________________________
>>Wien mailing list
>>Wien at zeus.theochem.tuwien.ac.at
>>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>**********************************************************
>
>Dr QiWen YAO
>
>JSPS Fellow
>Multifunctional Materials Group
>Optical and Electronic Materials Unit
>Environment and Energy Materials Research Division
>
>National Institute for Materials Science
>
>1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
>Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501
>
>**********************************************************
>
>_______________________________________________
>Wien mailing list
>Wien at zeus.theochem.tuwien.ac.at
>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>_______________________________________________
>Wien mailing list
>Wien at zeus.theochem.tuwien.ac.at
>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

**********************************************************

Dr QiWen YAO

JSPS Fellow
Multifunctional Materials Group
Optical and Electronic Materials Unit
Environment and Energy Materials Research Division

National Institute for Materials Science

1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501

**********************************************************

Reply via email to