[Wien] errors in lapw
Dear all, I am running wien2k 11.1 on a cluster with Centos 6 under a pbs queuing system. The job is submitted in a k-point parallel mode and the total 36 kpoints are divided by 16 cups. But there comes some errors in lapw2 and the dnlapw2_18/19/20.error files are not empty. At the same time, the job in pbs system seems dead and can not be killed by the pbs command. The administrator check the computing node and command top shows that the node is experiencing very heavy load above 40. Further, ps aux shows that there are 16 lapw2 processes but not running or say suspended. The jobs caused a heavy load and triggered the self-protection mechanism of the OS, which automatically suspends any running process including ssh login except root account. Any comments will be appreciated and thanks in advanced. The followings are the error files and case.dayfile. dnlapw2_18/19/20.error-- Error in LAPW2 -case.output2dn_19 ... KVEC( 73563) = -19 -599.10461 KVEC( 73564) = -19 24 -99.10461 KVEC( 73565) = -19 2499.10461 KVEC( 73566) =19 -24 -99.10461 KVEC( 73567) =19 -2499.10461 KVEC( 73568) =195 -99.10461 KVEC( 73569) =19599.10461 KVE case.dayfile--- ... [14] Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [9]Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [4]Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [4] 18809 - -:log ... Thu Feb 2 17:58:03 CST 2012 (x) lapw1 -c -dn -p -orb Thu Feb 2 19:46:53 CST 2012 (x) lapw2 -c -up -p Thu Feb 2 19:51:36 CST 2012 (x) sumpara -up -d Thu Feb 2 19:52:07 CST 2012 (x) lapw2 -c -dn -p (If more information is needed, I will provide.) Best, -- Bin Shao, Ph.D. Candidate College of Information Technical Science, Nankai University 94 Weijin Rd. Nankai Dist. Tianjin 300071, China Email: bshao at mail.nankai.edu.cn -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120203/e1af9482/attachment-0001.htm
[Wien] errors in lapw
Clearly you should write your job script such that it divides the 36 k-points in a meaningful way. In principle you can use 36,18,9,6,4,or 3 parallel jobs, but 16 us not meaningful. Furthermore, it seems that your cluster has problems with heavy I/O (NFS) and this is most likely the reason for the observed high load and the crash. Thus I would i) not use too many cores. Has one node of your cluster really 16 cores, or is this just due to multithreading and in fact it has only 8 ? Do you have enough memory per node ? ii) try to use a (local) $SCRATCH directory, which reduces the NFS load. But this works only if your k-list and .machines file is compatible as mentioned above. It also seems a bit of a bigger calculations (lapw1 took nearly 2h), thus you may either need MPI or you should not use all cores on one node at your cluster because of memory restrictions. Am 03.02.2012 13:56, schrieb Bin Shao: Dear all, I am running wien2k 11.1 on a cluster with Centos 6 under a pbs queuing system. The job is submitted in a k-point parallel mode and the total 36 kpoints are divided by 16 cups. But there comes some errors in lapw2 and the dnlapw2_18/19/20.error files are not empty. At the same time, the job in pbs system seems dead and can not be killed by the pbs command. The administrator check the computing node and command top shows that the node is experiencing very heavy load above 40. Further, ps aux shows that there are 16 lapw2 processes but not running or say suspended. The jobs caused a heavy load and triggered the self-protection mechanism of the OS, which automatically suspends any running process including ssh login except root account. Any comments will be appreciated and thanks in advanced. The followings are the error files and case.dayfile. dnlapw2_18/19/20.error-- Error in LAPW2 -case.output2dn_19 ... KVEC( 73563) = -19 -599.10461 KVEC( 73564) = -19 24 -99.10461 KVEC( 73565) = -19 2499.10461 KVEC( 73566) =19 -24 -99.10461 KVEC( 73567) =19 -2499.10461 KVEC( 73568) =195 -99.10461 KVEC( 73569) =19599.10461 KVE case.dayfile--- ... [14] Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [9]Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [4]Done ( ( $remote $machine[$p] cd $PWD;$t $exe ${def}_${loop}.def $loop;fixerror_lapw ${def}_$loop; rm -f .lock_$lockfile[$p] ) .stdout2_$loop; if ( -f .stdout2_$loop ) bashtime2csh.pl_lapw .stdout2_$loop .temp2_$loop; grep \% .temp2_$loop .time2_$loop; grep -v \% .temp2_$loop | perl -e print stderr STDIN ) [4] 18809 - -:log ... Thu Feb 2 17:58:03 CST 2012 (x) lapw1 -c -dn -p -orb Thu Feb 2 19:46:53 CST 2012 (x) lapw2 -c -up -p Thu Feb 2 19:51:36 CST 2012 (x) sumpara -up -d Thu Feb 2 19:52:07 CST 2012 (x) lapw2 -c -dn -p (If more information is needed, I will provide.) Best, -- Bin Shao, Ph.D. Candidate College of Information Technical Science, Nankai University 94 Weijin Rd. Nankai Dist. Tianjin 300071, China Email: bshao at mail.nankai.edu.cn mailto:bshao at mail.nankai.edu.cn ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: blaha at theochem.tuwien.ac.atWWW: http://info.tuwien.ac.at/theochem/ --
[Wien] SPHBES - Error
The program worked with RMT for Mg (2), for Fe (1.8) and H (1). I want to know the precision of calculation in function of RMTKMAX and K points for determination of calculation parameter for the kind of material ( E0,a0,B) and times consuming Sincerely yours 2012/2/1, Laurence Marks L-marks at northwestern.edu: Why are you using RKMAX=8 or 9.5? These are way too big. Since the smallest RMT is 1.0 (H) a value of 5 should be fine, maybe 6 at the most. On Tue, Jan 31, 2012 at 2:17 PM, Bouabdellah AZOUZA b.azouza at gmail.com wrote: After several attempts I confused the numbers. here is my file, the interatomic distances are in bohr but The problem persists. MgFeH3 P LATTICE,NONEQUIV.ATOMS: 3221_Pm-3m MODE OF CALC=RELA unit=bohr 6.292787 6.292787 6.292787 90.00 90.00 90.00 ATOM 1: X=0. Y=0. Z=0. MULT= 1 ISPLIT= 2 Mg1NPT= 781 R0=0.0001 RMT=2.5000 Z: 12.0 LOCAL ROT MATRIX:1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 ATOM 2: X=0.5000 Y=0.5000 Z=0.5000 MULT= 1 ISPLIT= 2 Fe2NPT= 781 R0=0.0001 RMT=1.9000 Z: 26.0 LOCAL ROT MATRIX:1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 ATOM -3: X=0. Y=0.5000 Z=0.5000 MULT= 3 ISPLIT=-2 -3: X=0.5000 Y=0. Z=0.5000 -3: X=0.5000 Y=0.5000 Z=0. H 3NPT= 781 R0=0.0001 RMT=1. Z: 1.0 LOCAL ROT MATRIX:0.000 0.000 1.000 0.000 1.000 0.000 -1.000 0.000 0.000 48 NUMBER OF SYMMETRY OPERATIONS 2012/1/30, Laurence Marks L-marks at northwestern.edu: You have confused Angstroms and Atomic Units when generating your structure -- the distances are way to close. Please go back to the web interface and input your structure in properly, or change the units of a,b,c to what they should be. This, rather than anything else, is 99.999% certain the source of your problems. 2012/1/30 Bouabdellah AZOUZA b.azouza at gmail.com: Dear Dr. Blaha for the small rkmax (6,6.5,7,7.5) it works, and here is my file a struct. MgFeH3 P LATTICE,NONEQUIV.ATOMS: 3221_Pm-3m MODE OF CALC=RELA unit=bohr 3.33 3.33 3.33 90.00 90.00 90.00 ATOM 1: X=0. Y=0. Z=0. MULT= 1 ISPLIT= 2 Mg1NPT= 781 R0=0.0001 RMT=1.3000 Z: 12.0 LOCAL ROT MATRIX:1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 ATOM 2: X=0.5000 Y=0.5000 Z=0.5000 MULT= 1 ISPLIT= 2 Fe2NPT= 781 R0=0.0001 RMT=1. Z: 26.0 LOCAL ROT MATRIX:1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 ATOM -3: X=0.5000 Y=0.5000 Z=0. MULT= 3 ISPLIT= 2 ATOM -3:X= 0. Y=0.5000 Z=0.5000 ATOM -3:X= 0.5000 Y=0. Z=0.5000 H 3NPT= 781 R0=0.0001 RMT=0.5500 Z: 1.0 LOCAL ROT MATRIX:1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 48 NUMBER OF SYMMETRY OPERATIONS thank you in advance for your help Best regards 2012/1/30, Peter Blaha pblaha at theochem.tuwien.ac.at: Does it occur with small RKMAX too ? How does your struct file look like ? Am 28.01.2012 14:57, schrieb Bouabdellah AZOUZA: Respected Sir, I am running wien version 11 on a machine of type I3 with operating system lunix 11.3, fortran compiler ifort I am running this case (MgFeH3.struct) (Perovskite structure).After defining and initializing the structure for RMTKmax=9, during SCF run it reports an error as : Error in LAPW1 SPHBES - Error for RMTKmax=8,9.5 during SCF run it reports an error as : error in lapw2 L2main ?OT ?B.GT 15 Ghostbands chek scf files Kindly help me how to remove this errors Best regards Bouabdellah azouza Department of Physics, USTHB Algiers Algeria ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: blaha at theochem.tuwien.ac.atWWW: http://info.tuwien.ac.at/theochem/ --
[Wien] band structure
Dear all users, I'm calculating the band structure of some topological half heuslers. All of the calculations run without any error, however I found wrong result on my band structure. If somebody helps me to correct them, I would be so thankful. First of all, in spite of seeing the essential band inversion, the gamma7 is not drawn! In addition, when I insert SO interaction(without spin polarization),in the absence of any error, I don't get correct results; and when I insert SO (with spin polarization),there's some errors in lapw2! I want to know if I should insert any change or further information in band structure step or not. Or it may be because of some wrong information in initso_lapw step? thank you so much in advance -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120203/ba8e13a5/attachment.htm
[Wien] band structure
what do you mean with I don't get correct results ? Ciao Gerhard DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy: I think the problem, to be quite honest with you, is that you have never actually known what the question is. Dr. Gerhard H. Fecher Institut of Inorganic and Analytical Chemistry Johannes Gutenberg - University 55099 Mainz Von: wien-bounces at zeus.theochem.tuwien.ac.at [wien-bounces at zeus.theochem.tuwien.ac.at]quot; im Auftrag von quot;Saba Sabeti [raskolnikof6028 at yahoo.com] Gesendet: Freitag, 3. Februar 2012 22:37 An: wien at zeus.theochem.tuwien.ac.at Betreff: [Wien] band structure Dear all users, I'm calculating the band structure of some topological half heuslers. All of the calculations run without any error, however I found wrong result on my band structure. If somebody helps me to correct them, I would be so thankful. First of all, in spite of seeing the essential band inversion, the gamma7 is not drawn! In addition, when I insert SO interaction(without spin polarization),in the absence of any error, I don't get correct results; and when I insert SO (with spin polarization),there's some errors in lapw2! I want to know if I should insert any change or further information in band structure step or not. Or it may be because of some wrong information in initso_lapw step? thank you so much in advance