[Wien] No lines in spaghetti
Dear Wien2K users I am trying to plot bandstructure of a monoclinic CXZ lattice. I picked the standard symmetry points of the BZ of this monoclinic lattice. Everything went well except that there are no lines in the bandstructure. Only dots at symmetry points. What could be the reason/remedy? Best regards Sanjeev -- Dr. Sanjeev Kumar Srivastava Assistant Professor Department of Physics and Meteorology Indian Institute of Technology Kharagpur Kharagpur 721302 India Ph.: 0091-3222-283854 (Office) 0091-3222-283855 (Residence) Mobile: 0091-9735444091 ---
[Wien] No lines in spaghetti
Dear Wien2K users You need not reply to this mail. I have got the solution. Sorry for inconvenience. Best regards Sanjeev - Original Message - From: Sanjeev K. Srivastava sanj...@phy.iitkgp.ernet.in To: A Mailing list for WIEN2k users wien at zeus.theochem.tuwien.ac.at Sent: Tuesday, June 7, 2011 10:45:49 AM Subject: [Wien] No lines in spaghetti Dear Wien2K users I am trying to plot bandstructure of a monoclinic CXZ lattice. I picked the standard symmetry points of the BZ of this monoclinic lattice. Everything went well except that there are no lines in the bandstructure. Only dots at symmetry points. What could be the reason/remedy? Best regards Sanjeev -- Dr. Sanjeev Kumar Srivastava Assistant Professor Department of Physics and Meteorology Indian Institute of Technology Kharagpur Kharagpur 721302 India Ph.: 0091-3222-283854 (Office) 0091-3222-283855 (Residence) Mobile: 0091-9735444091 --- ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- Dr. Sanjeev Kumar Srivastava Assistant Professor Department of Physics and Meteorology Indian Institute of Technology Kharagpur Kharagpur 721302 India Ph.: 0091-3222-283854 (Office) 0091-3222-283855 (Residence) Mobile: 0091-9735444091 ---
[Wien] pondering wien2kMPI performance
Hi, lapw0 is parallelized in loop over atoms, there is little communication here, and fftw, for that look at fftw manual. For lapw1, setup has no communication at all, eigensolver is done with pblas and scalapack calls, here both latency and bandwidth are important, but these libraries should be well optimized, I would point more to bandwidth. lapw2 uses two coexisting communicators, one for parallelization vs atoms, and the other for splitting vector (lapw2_vector_split in .machines), for large systems you have to split vector. I guess that here major time is used in pblas calls, which are matrix/matrix multiplications, however on some old and less efficient systems we have notice huge time spend on reading and distributing the vector file. regards Robert On Monday 06 June 2011 23:31:44 Kevin Jorissen wrote: Dear wien2k community, I have a few basic questions regarding the MPI/SCALAPACK version of wien2k : * does anyone have a formula for calculating the memory requirements of the code (lapw0/1/2) given, say, nmat and nume and the number of cores used? It's easy enough for the serial code, but I'm sometimes baffled by the memory taken by each of the MPI threads when distributing the job over N cores. It's sometimes very different from [serial size in GB] / N_cores. It makes the queue manager unhappy, and occasionally I unintentionally overload a node this way. * I was asked the following question about the MPI wien2k code : So would it be correct to state that your apps are more bandwidth sensitive than latency sensitive? and I don't know what to answer. Thinking about LARGE calculations (hundreds of atoms) I want to say that both will be important ... Does anyone have a more sophisticated insight here? cheers, Kevin Jorissen University of Washington ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- Dr Robert Laskowski Vienna University of Technology, Institute of Materials Chemistry, Getreidemarkt 9/165-TC, A-1060 Vienna, Austria tel. +43 1 58801 15675 Fax +43 1 58801 15698
[Wien] pondering wien2kMPI performance
Just adding three more comments to what Robert said: memory in lapw0: depends mainly on GMAX (in2) and/or IFFT-parameters and enhancement factor in case.in0 (memory critical for large FFT grids (enhancement factors), parallelization solves the problem) lapw1: SCALAPACK diagonalization needs significantly more memory then sequential LAPACK. (instead of (NMAT*(NMAT+1)/2) you need NMAT**2 for H and S, plus additional large auxiliary arrays. iterative diagonalization needs another large NMAT**2 array + vectors (NMAT*NUME) lapw2: in most cases the real memory critical step There are many cases where lapw1 still does fine in terms of memory, but lapw2 does NOT !!! Solve it by using lapw2_vector_split: 2 (or even 4) in .machines file Check the parallel case.output* files to get an idea about memory allocation. Am 07.06.2011 08:31, schrieb Robert Laskowski: Hi, lapw0 is parallelized in loop over atoms, there is little communication here, and fftw, for that look at fftw manual. For lapw1, setup has no communication at all, eigensolver is done with pblas and scalapack calls, here both latency and bandwidth are important, but these libraries should be well optimized, I would point more to bandwidth. lapw2 uses two coexisting communicators, one for parallelization vs atoms, and the other for splitting vector (lapw2_vector_split in .machines), for large systems you have to split vector. I guess that here major time is used in pblas calls, which are matrix/matrix multiplications, however on some old and less efficient systems we have notice huge time spend on reading and distributing the vector file. regards Robert On Monday 06 June 2011 23:31:44 Kevin Jorissen wrote: Dear wien2k community, I have a few basic questions regarding the MPI/SCALAPACK version of wien2k : * does anyone have a formula for calculating the memory requirements of the code (lapw0/1/2) given, say, nmat and nume and the number of cores used? It's easy enough for the serial code, but I'm sometimes baffled by the memory taken by each of the MPI threads when distributing the job over N cores. It's sometimes very different from [serial size in GB] / N_cores. It makes the queue manager unhappy, and occasionally I unintentionally overload a node this way. * I was asked the following question about the MPI wien2k code : So would it be correct to state that your apps are more bandwidth sensitive than latency sensitive? and I don't know what to answer. Thinking about LARGE calculations (hundreds of atoms) I want to say that both will be important ... Does anyone have a more sophisticated insight here? cheers, Kevin Jorissen University of Washington ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-15671 FAX: +43-1-58801-15698 Email: blaha at theochem.tuwien.ac.atWWW: http://info.tuwien.ac.at/theochem/ --
[Wien] compilation aborted for lap_bp.f (code 1)
Dear wien2k users, We have tried to install wien2k in a 64 bit system using compiler 11.1.046. The OPTIONS used are given below: current:FOPT:-FR -mp1 -w -prec_div -pad -ip -O3 -axTW -traceback current:FPOPT:$(FOPT) current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/lib/intel64 -static-intel -Bstatic -lguide -lguide_stats -lsvml -Bdynamic -lpthread current:DPARALLEL:'-DParallel' current:R_LIBS:-L/opt/intel/Compiler/11.1/046/mkl/lib/em64t -lguide -lpthread current:RP_LIBS:-lmkl_intel_lp64 -lmkl_scalapack_lp64 -lmkl_blacs_lp64 -lmkl_sequential -lmkl_em64t All the programs were compiled properly except lapwso. 1 error appeared as follows: *compilation aborted for lap_bp.f (code 1)* *make: *** [lap_bp.o] Error 1* We are not able to proceed any further. Any response in this regard will be appreciated. Thanks in advance, with best regards, -- Shamik Chakrabarti Research Scholar Dept. of Physics Meteorology Material Processing Solid State Ionics Lab IIT Kharagpur Kharagpur 721302 INDIA -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20110607/8b03a36a/attachment.htm
[Wien] GGA-EV
The EV-GGA functional corresponds to indxc=15 in case.in0. This means EV93 for exchange and PW91 for correlation. On Tue, 7 Jun 2011, AJAY SINGH VERMA wrote: Dear all users and Blaha Sir, Please clarify me that many papers quotes the results with EV-GGA functional, but iam unable to find it out in the Userguide thanking u. A S Verma
[Wien] compilation aborted for lap_bp.f (code 1)
What was the error message of the compiler ??? By the way The latest stable version of the 11.1 compiler was 11.1.075 older ones contain an overestimation that you can easily find when you search this forum and the same procedure as every week http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ and about compiler switches the Intel manual for 11.1 tells for example (see also the section Deprecated and Removed Compiler Options in the manual) ax, Qax Tells the compiler to generate multiple, processor-specific auto-dispatch code paths for Intel processors if there is a performance benefit. IDE Equivalent Windows: Code Generation Add Processor-Optimized Code Path Optimization Generate Alternate Code Paths Linux: None Mac OS X: Code Generation Add Processor-Optimized Code Path Architectures IA-32, Intel? 64 architectures Syntax Linux and Mac OS X: -axprocessor Windows: /Qaxprocessor Arguments processor Indicates the processor for which code is generated. The following descriptions refer to Intel? Streaming SIMD Extensions (Intel? SSE) and Supplemental Streaming SIMD Extensions (Intel? SSSE). Possible values are: SSE4.2 Can generate Intel? SSE4 Efficient Accelerated String and Text Processing instructions supported by Intel? Core? i7 processors. Can generate Intel? SSE4 Vectorizing Compiler and Media Accelerator, Intel? SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for the Intel? Core? processor family. SSE4.1 Can generate Intel? SSE4 Vectorizing Compiler and Media Accelerator instructions for Intel processors. Can generate Intel? SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for Intel? 45nm Hi-k next generation Intel? Core? microarchitecture. This replaces value S, which is deprecated. SSSE3 Can generate Intel? SSSE3, SSE3, SSE2, and SSE instructions for Intel processors and it can optimize for the Intel? Core?2 Duo processor family. For Mac OS* X systems, this value is only supported on Intel? 64 architecture. This replaces value T, which is deprecated. SSE3 Can generate Intel? SSE3, SSE2, and SSE instructions for Intel processors and it can optimize for processors based on Intel? Core? microarchitecture and Intel NetBurst? microarchitecture. For Mac OS* X systems, this value is only supported on IA-32 architecture. This replaces value P, which is deprecated. SSE2 Can generate Intel? SSE2 and SSE instructions for Intel processors, and it can optimize for Intel? Pentium? 4 processors, Intel? Pentium? M processors, and Intel? Xeon? processors with Intel? SSE2. This value is not available on Mac OS* X systems. This replaces value N, which is deprecated. Default OFF No auto-dispatch code is generated. Processor-specific code is generated and is controlled by the setting of compiler option -m (Linux), compiler option /arch (Windows), or compiler option -x (Mac OS* X). Description This option tells the compiler to generate multiple, processor-specific auto-dispatch code paths for Intel processors if there is a performance benefit. It also generates a baseline code path. The baseline code is usually slower than the specialized code. The baseline code path is determined by the architecture specified by the -x (Linux and Mac OS X) or /Qx (Windows) option. While there are defaults for the -x or /Qx option that depend on the operating system being used, you can specify an architecture for the baseline code that is higher or lower than the default. The specified architecture becomes the effective minimum architecture for the baseline code path. If you specify both the -ax and -x options (Linux and Mac OS X) or the /Qax and /Qx options (Windows), the baseline code will only execute on processors compatible with the processor type specified by the -x or /Qx option. This option tells the compiler to find opportunities to generate separate versions of functions that take advantage of features of the specified Intel? processor. If the compiler finds such an opportunity, it first checks whether generating a processor-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a processor-specific version of a function and a baseline version of the function. At run time, one of the versions is chosen to execute, depending on the Intel processor in use. In this way, the program can benefit from performance gains on more advanced Intel processors, while still working properly on older processors. You can use more than one of the processor values by combining them. For example, you can specify -axSSE4.1,SSSE3 (Linux and Mac OS X) or /QaxSSE4.1,SSSE3 (Windows). You cannot combine the old style, deprecated options and the new options. For example, you cannot specify -axSSE4.1,T (Linux and Mac OS X) or /QaxSSE4.1,T (Windows). Previous values W and K are deprecated. The details on replacements are as follows: Mac OS X systems:
[Wien] GGA-EV
and note that this functional is optimized to reproduce the correct exchange it is not suitable for total energy calculations, therefore don't use it for optimization Ciao Gerhard Dr. Gerhard H. Fecher Institut of Inorganic and Analytical Chemistry Johannes Gutenberg - University 55099 Mainz Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at zeus.theochem.tuwien.ac.at]quot; im Auftrag von quot;tran at theochem.tuwien.ac.at [tran at theochem.tuwien.ac.at] Gesendet: Dienstag, 7. Juni 2011 11:06 Bis: A Mailing list for WIEN2k users Betreff: Re: [Wien] GGA-EV The EV-GGA functional corresponds to indxc=15 in case.in0. This means EV93 for exchange and PW91 for correlation. On Tue, 7 Jun 2011, AJAY SINGH VERMA wrote: Dear all users and Blaha Sir, Please clarify me that many papers quotes the results with EV-GGA functional, but iam unable to find it out in the Userguide thanking u. A S Verma ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
[Wien] GGA-EV
And in addition: Nowadays I would NOT use EV-GGA, but the mBJ potential. It gives gaps with much higher accuracy. Am 07.06.2011 11:06, schrieb Gerhard Fecher: and note that this functional is optimized to reproduce the correct exchange it is not suitable for total energy calculations, therefore don't use it for optimization Ciao Gerhard Dr. Gerhard H. Fecher Institut of Inorganic and Analytical Chemistry Johannes Gutenberg - University 55099 Mainz Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at zeus.theochem.tuwien.ac.at]quot; im Auftrag vonquot;tran at theochem.tuwien.ac.at [tran at theochem.tuwien.ac.at] Gesendet: Dienstag, 7. Juni 2011 11:06 Bis: A Mailing list for WIEN2k users Betreff: Re: [Wien] GGA-EV The EV-GGA functional corresponds to indxc=15 in case.in0. This means EV93 for exchange and PW91 for correlation. On Tue, 7 Jun 2011, AJAY SINGH VERMA wrote: Dear all users and Blaha Sir, Please clarify me that many papers quotes the results with EV-GGA functional, but iam unable to find it out in the Userguide thanking u. A S Verma ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-15671 FAX: +43-1-58801-15698 Email: blaha at theochem.tuwien.ac.atWWW: http://info.tuwien.ac.at/theochem/ --