[Wien] LAPW2 crashed when running in parallel
Hi, It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of this it has '-hostfile' option (form here: http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt). Try 'mpirun -h' for information about options and apply appropriate. Best regards, Maxim Rakitin email: rms85 at physics.susu.ac.ru web: http://www.susu.ac.ru 01.11.2010 4:56, Wei Xie ?: Dear all WIEN2k community members: We encountered some problem when running in parallel (K-point, MPI or both)--the calculations crashed at LAPW2. Note we had no problem running it in serial. We have tried to diagnose the problem, recompile the code with difference options and test with difference cases and parameters based on similar problems reported on the mail list, but the problem persists. So we write here hoping someone can offer us some suggestion. We have attached related files below for your reference. Your replies are appreciated in advance! This is a TiC example running in both Kpoint and MPI parallel on two nodes /r1i0n0/ and /r1i0n1/ (8cores/node): *1. **stdout **(abridged) * MPI: invalid option -machinefile real0m0.004s user0m0.000s sys0m0.000s ... MPI: invalid option -machinefile real0m0.003s user0m0.000s sys0m0.004s TiC.scf1up_1: No such file or directory. LAPW2 - Error. Check file lapw2.error cp: cannot stat `.in.tmp': No such file or directory rm: cannot remove `.in.tmp': No such file or directory *rm: cannot remove `.in.tmp1': No such file or directory* * * *2. TiC.dayfile (abridged) * ... start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go) cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go) lapw0 -p(16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT 2010 .machine0 : 16 processors invalid local arg: -machinefile 0.436u 0.412s 0:04.63 18.1%0+0k 2600+0io 1pf+0w lapw1 -up -p (16:25:12) starting parallel lapw1 at Sun Oct 31 16:25:12 MDT 2010 - starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010 running LAPW1 in parallel mode (using .machines) 2 number_of_parallel_jobs r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)Summary of lapw1para: r1i0n0 k=0 user=0 wallclock=0 r1i0n1 k=0 user=0 wallclock=0 ... 0.116u 0.316s 0:10.48 4.0%0+0k 0+0io 0pf+0w lapw2 -up -p (16:25:34) running LAPW2 in parallel mode ** LAPW2 crashed! 0.032u 0.104s 0:01.13 11.5%0+0k 82304+0io 8pf+0w error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed *3. uplapw2.error * Error in LAPW2 'LAPW2' - can't open unit: 18 'LAPW2' -filename: TiC.vspup 'LAPW2' - status: old form: formatted ** testerror: Error in Parallel LAPW2 *4. .machines* # 1:r1i0n0:8 1:r1i0n1:8 lapw0:r1i0n0:8 r1i0n1:8 granularity:1 extrafine:1 *5. compilers, MPI and options* Intel Compilers and MKL 11.1.046 Intel MPI 3.2.0.011 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ Best regards, Wei Xie Computational Materials Group University of Wisconsin-Madison ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/e1463e23/attachment.htm
[Wien] LAPW2 crashed when running in parallel
Hi Maxim, Thanks for your reply! We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but the problem persists. The only difference is that stdout changes to ''? MPI: invalid option -hostfile ?''. Thanks, Wei On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote: Hi, It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of this it has '-hostfile' option (form here: http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt). Try 'mpirun -h' for information about options and apply appropriate. Best regards, Maxim Rakitin email: rms85 at physics.susu.ac.ru web: http://www.susu.ac.ru 01.11.2010 4:56, Wei Xie ?: Dear all WIEN2k community members: We encountered some problem when running in parallel (K-point, MPI or both)--the calculations crashed at LAPW2. Note we had no problem running it in serial. We have tried to diagnose the problem, recompile the code with difference options and test with difference cases and parameters based on similar problems reported on the mail list, but the problem persists. So we write here hoping someone can offer us some suggestion. We have attached related files below for your reference. Your replies are appreciated in advance! This is a TiC example running in both Kpoint and MPI parallel on two nodes r1i0n0 and r1i0n1 (8cores/node): 1. stdout (abridged) MPI: invalid option -machinefile real 0m0.004s user 0m0.000s sys 0m0.000s ... MPI: invalid option -machinefile real 0m0.003s user 0m0.000s sys 0m0.004s TiC.scf1up_1: No such file or directory. LAPW2 - Error. Check file lapw2.error cp: cannot stat `.in.tmp': No such file or directory rm: cannot remove `.in.tmp': No such file or directory rm: cannot remove `.in.tmp1': No such file or directory 2. TiC.dayfile (abridged) ... start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go) cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go) lapw0 -p (16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT 2010 .machine0 : 16 processors invalid local arg: -machinefile 0.436u 0.412s 0:04.63 18.1% 0+0k 2600+0io 1pf+0w lapw1 -up -p(16:25:12) starting parallel lapw1 at Sun Oct 31 16:25:12 MDT 2010 - starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010 running LAPW1 in parallel mode (using .machines) 2 number_of_parallel_jobs r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)Summary of lapw1para: r1i0n0 k=0 user=0 wallclock=0 r1i0n1 k=0 user=0 wallclock=0 ... 0.116u 0.316s 0:10.48 4.0% 0+0k 0+0io 0pf+0w lapw2 -up -p (16:25:34) running LAPW2 in parallel mode ** LAPW2 crashed! 0.032u 0.104s 0:01.13 11.5% 0+0k 82304+0io 8pf+0w error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed 3. uplapw2.error Error in LAPW2 'LAPW2' - can't open unit: 18 'LAPW2' -filename: TiC.vspup 'LAPW2' - status: old form: formatted ** testerror: Error in Parallel LAPW2 4. .machines # 1:r1i0n0:8 1:r1i0n1:8 lapw0:r1i0n0:8 r1i0n1:8 granularity:1 extrafine:1 5. compilers, MPI and options Intel Compilers and MKL 11.1.046 Intel MPI 3.2.0.011 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ Best regards, Wei Xie Computational Materials Group University of Wisconsin-Madison ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101031/2ce15505/attachment.htm
[Wien] LAPW2 crashed when running in parallel
mailing list Wien at zeus.theochem.tuwien.ac.at mailto:Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/4a99021e/attachment.htm
[Wien] Gd5Si4 calculations freeze
When the EF WARNING occurs during the scf cycle, but not at the end, this is no problem. However, your lines: lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w are suspicious. You get only 25% of the CPU. lapw1 should finish after 14 min, but took almost one hour. Either 3 other jobs are running on this single cpu machine, or you run out of memory (you have very large system times !) and the machine pages. (Check with the top command during execution. If memory is all used, either reduce RKMAX or go to a different machine. I have tried again to run a regular LDA calculations. Again as in the LDA+U case, lapw2 takes very long time (calculating like 2 days already), and there is an interesting error: dayfile *** 0.461365296892454 211.90982969 212.16483888 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max 0.461365291265349 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go) start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go) *** I have seen it before for this compound during earlier trials. In general, somehow this compound Gd5Si4 takes maybe 10 times longer to go through the initial lapw0, lapw1 cycles than other transition metal containing compounds I was calculating with about 10 times finer k-mesh! And sizes of the unit cells are roughly comparable. I have attached a scf2 file from this cycle. Gd5Si4_np_2nd.scf2 *** :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5 Bandranges (emin - emax) and occupancy: :BAN00204: 204 0.457469 0.458922 2. :BAN00205: 205 0.458380 0.459246 2. :BAN00206: 206 0.458579 0.459997 2. :BAN00207: 207 0.459122 0.460319 2. :BAN00208: 208 0.459122 0.460319 2. :BAN00209: 209 0.459929 0.460891 2. :BAN00210: 210 0.459944 0.460891 2. :BAN00211: 211 0.460551 0.461480 1.89644048 :BAN00212: 212 0.460552 0.461646 1.73748489 :BAN00213: 213 0.460552 0.462067 0.30738641 :BAN00214: 214 0.461259 0.462270 0.05869553 :BAN00215: 215 0.461611 0.462536 0. :BAN00216: 216 0.461611 0.462545 0. :BAN00217: 217 0.461923 0.462744 0. :BAN00218: 218 0.461923 0.462895 0. :BAN00219: 219 0.462510 0.463225 0. Energy to separate low and high energystates: 0.02312 :NOE : NUMBER OF ELECTRONS = 424.000 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137 *** Looking forward to your advises and ideas, Thank you, Volodymyr ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- Peter Blaha Inst.Materials Chemistry TU Vienna Getreidemarkt 9 A-1060 Vienna Austria +43-1-5880115671
[Wien] Gd5Si4 calculations freeze
Dear Prof. Blaha, Where can I find timings for parallel running programs? In my calculations I usually get such strings in case.dayfile: Summary of lapw1para: node-09-07k=0 user=0 wallclock=0 0.692u 1.028s 14:50.98 0.1% 0+0k 0+0io 0pf+0w But this 'time' command output is for lapw1para script, not for actual lapw1c_mpi programs. The same situation is for lapw0/2. Thank you. Best regards, Maxim Rakitin Email: rms85 at physics.susu.ac.ru Web: http://www.susu.ac.ru 01.11.2010 11:48, Peter Blaha ?: When the EF WARNING occurs during the scf cycle, but not at the end, this is no problem. However, your lines: lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w are suspicious. You get only 25% of the CPU. lapw1 should finish after 14 min, but took almost one hour. Either 3 other jobs are running on this single cpu machine, or you run out of memory (you have very large system times !) and the machine pages. (Check with the top command during execution. If memory is all used, either reduce RKMAX or go to a different machine. I have tried again to run a regular LDA calculations. Again as in the LDA+U case, lapw2 takes very long time (calculating like 2 days already), and there is an interesting error: dayfile *** 0.461365296892454 211.90982969 212.16483888 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max 0.461365291265349 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go) start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go) *** I have seen it before for this compound during earlier trials. In general, somehow this compound Gd5Si4 takes maybe 10 times longer to go through the initial lapw0, lapw1 cycles than other transition metal containing compounds I was calculating with about 10 times finer k-mesh! And sizes of the unit cells are roughly comparable. I have attached a scf2 file from this cycle. Gd5Si4_np_2nd.scf2 *** :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5 Bandranges (emin - emax) and occupancy: :BAN00204: 204 0.457469 0.458922 2. :BAN00205: 205 0.458380 0.459246 2. :BAN00206: 206 0.458579 0.459997 2. :BAN00207: 207 0.459122 0.460319 2. :BAN00208: 208 0.459122 0.460319 2. :BAN00209: 209 0.459929 0.460891 2. :BAN00210: 210 0.459944 0.460891 2. :BAN00211: 211 0.460551 0.461480 1.89644048 :BAN00212: 212 0.460552 0.461646 1.73748489 :BAN00213: 213 0.460552 0.462067 0.30738641 :BAN00214: 214 0.461259 0.462270 0.05869553 :BAN00215: 215 0.461611 0.462536 0. :BAN00216: 216 0.461611 0.462545 0. :BAN00217: 217 0.461923 0.462744 0. :BAN00218: 218 0.461923 0.462895 0. :BAN00219: 219 0.462510 0.463225 0. Energy to separate low and high energystates: 0.02312 :NOE : NUMBER OF ELECTRONS = 424.000 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137 *** Looking forward to your advises and ideas, Thank you, Volodymyr ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/2d206f58/attachment.htm
[Wien] problem with initso
Hi Prof Blaha, I am having some touble initializing job with spin-orbit coupling. I am running a spin polarized job for a 96 atom surface supercell. After converging calculation without spin-orbit I tried to initialize job with initso using both versions 10.1 and 9.2. While it works for some cases but for most cases I got the same error message shown below. I have done spin-obit calculation before for smaller systems and never had any problem with initso. So I would appreciate if you could let me know how to fix this problem. Do you have a spinpolarized case (and want to run symmetso) ? (y/N)y 90.090.01.57079632679490 T 1.00 0.000E+000 0.000E+000 6.123233995736766E-017 1.00 0.000E+000 6.123233995736766E-017 6.123233995736766E-017 1.00 forrtl: severe (64): input conversion error, unit 21, file /home/eisfh/WIEN2k/Surface/MnSurf/fMn110/so_test/fMn110/fMn110.struct_so Image PC RoutineLine Source symmetso 004B65C1 Unknown Unknown Unknown symmetso 004B5595 Unknown Unknown Unknown symmetso 0048427A Unknown Unknown Unknown symmetso 0047AEF2 Unknown Unknown Unknown symmetso 0047A721 Unknown Unknown Unknown symmetso 0043F396 Unknown Unknown Unknown symmetso 0041AC99 Unknown Unknown Unknown symmetso 00405640 Unknown Unknown Unknown symmetso 0040340C Unknown Unknown Unknown libc.so.6 0033E881D994 Unknown Unknown Unknown symmetso 00403319 Unknown Unknown Unknown 13.330u 6.481s 0:35.97 55.0%0+0k 0+0io 0pf+0w error: command /home/eisfh/Wien2k_09.2/symmetso symmetso.def failed Thanks, Fhokrul -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/eb713495/attachment.htm
[Wien] Gd5Si4 calculations freeze
case.output1_1 or similar (depending on spin,...) Am 01.11.2010 09:40, schrieb Maxim Rakitin: Dear Prof. Blaha, Where can I find timings for parallel running programs? In my calculations I usually get such strings in case.dayfile: Summary of lapw1para: node-09-07 k=0 user=0 wallclock=0 0.692u 1.028s 14:50.98 0.1% 0+0k 0+0io 0pf+0w But this 'time' command output is for lapw1para script, not for actual lapw1c_mpi programs. The same situation is for lapw0/2. Thank you. Best regards, Maxim Rakitin Email:rms85 at physics.susu.ac.ru Web:http://www.susu.ac.ru 01.11.2010 11:48, Peter Blaha ?: When the EF WARNING occurs during the scf cycle, but not at the end, this is no problem. However, your lines: lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w are suspicious. You get only 25% of the CPU. lapw1 should finish after 14 min, but took almost one hour. Either 3 other jobs are running on this single cpu machine, or you run out of memory (you have very large system times !) and the machine pages. (Check with the top command during execution. If memory is all used, either reduce RKMAX or go to a different machine. I have tried again to run a regular LDA calculations. Again as in the LDA+U case, lapw2 takes very long time (calculating like 2 days already), and there is an interesting error: dayfile *** 0.461365296892454 211.90982969 212.16483888 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max 0.461365291265349 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go) start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go) *** I have seen it before for this compound during earlier trials. In general, somehow this compound Gd5Si4 takes maybe 10 times longer to go through the initial lapw0, lapw1 cycles than other transition metal containing compounds I was calculating with about 10 times finer k-mesh! And sizes of the unit cells are roughly comparable. I have attached a scf2 file from this cycle. Gd5Si4_np_2nd.scf2 *** :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5 Bandranges (emin - emax) and occupancy: :BAN00204: 204 0.457469 0.458922 2. :BAN00205: 205 0.458380 0.459246 2. :BAN00206: 206 0.458579 0.459997 2. :BAN00207: 207 0.459122 0.460319 2. :BAN00208: 208 0.459122 0.460319 2. :BAN00209: 209 0.459929 0.460891 2. :BAN00210: 210 0.459944 0.460891 2. :BAN00211: 211 0.460551 0.461480 1.89644048 :BAN00212: 212 0.460552 0.461646 1.73748489 :BAN00213: 213 0.460552 0.462067 0.30738641 :BAN00214: 214 0.461259 0.462270 0.05869553 :BAN00215: 215 0.461611 0.462536 0. :BAN00216: 216 0.461611 0.462545 0. :BAN00217: 217 0.461923 0.462744 0. :BAN00218: 218 0.461923 0.462895 0. :BAN00219: 219 0.462510 0.463225 0. Energy to separate low and high energystates: 0.02312 :NOE : NUMBER OF ELECTRONS = 424.000 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137 *** Looking forward to your advises and ideas, Thank you, Volodymyr ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- Peter Blaha Inst.Materials Chemistry TU Vienna Getreidemarkt 9 A-1060 Vienna Austria +43-1-5880115671
[Wien] LAPW2 crashed when running in parallel
01 Nov 2010 02:56:47 Wei Xie wrote: We encountered some problem when running in parallel (K-point, MPI or both)--the calculations crashed at LAPW2. Note we had no problem running it in serial. This is a TiC example running Dear Wei, Isn't the error connected with spin-polarised - spin-UNpolarised cases? TiC is to be calculated unpolarised, as far as I know. 1. stdout (abridged) ... TiC.scf1up_1: No such file or directory. Was lapw1 really successfull? 3. uplapw2.error Error in LAPW2 'LAPW2' - can't open unit: 18 'LAPW2' -filename: TiC.vspup 'LAPW2' - status: old form: formatted It looks like your initialization was done without spin-polarization, but runsp_lapw was run. But in this case lapw1 must also be unsuccessful (?). Best regards, Lyudmila Dobysheva -- Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci. 426001 Izhevsk, ul.Kirova 132 RUSSIA -- Tel.:7(3412) 442118 (home), 218988(office), 250614(Fax) E-mail: lyu at otf.fti.udmurtia.su lyuka17 at mail.ru lyu at otf.pti.udm.ru http://fti.udm.ru/content/view/25/103/lang,english/ --
[Wien] LAPW2 crashed when running in parallel
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ Best regards, Wei Xie Computational Materials Group University of Wisconsin-Madison ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/1e2e177e/attachment.htm
[Wien] what should be the best configuration of the system for DFT calculation of ceramics and polymers
Dear Peter Blaha Sir and wien2k users, We want to do DFT calculation by wien2k for around *250 atoms/super cell* and we *also want to do simulation of polymer through* *material studio*. My question is what should be the best possible configuration for a server (computer) which can serve this purpose?.we initially do not want clusterring of many nodes rather we want a compact server which will contain say atleast 2 quad core processors. In terms of configuration we mainly need: (1) How many quad core processors we should use? (if it is possible to use more than 2 quad core processor in a compact server!) (2) If possible please suggest the name of the processor which can serve our purpose the best ( Example: Intel Xeon Processor W3520, 2.66 GHz) (3) What should be the RAM we should be opted for? (4) Is there any other requirements except the above mentioned things we should consider? Any response will be very helpful for us. Thanks in advance. with best regards, -- Shamik Chakrabarti Research Scholar Dept. of Physics Meteorology Material Processing Solid State Ionics Lab IIT Kharagpur Kharagpur 721302 INDIA -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/3516b423/attachment.htm
[Wien] LAPW2 crashed when running in parallel
:01.13 11.5%0+0k 82304+0io 8pf+0w error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed *3. uplapw2.error * Error in LAPW2 'LAPW2' - can't open unit: 18 'LAPW2' -filename: TiC.vspup 'LAPW2' - status: old form: formatted ** testerror: Error in Parallel LAPW2 *4. .machines* # 1:r1i0n0:8 1:r1i0n1:8 lapw0:r1i0n0:8 r1i0n1:8 granularity:1 extrafine:1 *5. compilers, MPI and options* Intel Compilers and MKL 11.1.046 Intel MPI 3.2.0.011 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ Best regards, Wei Xie Computational Materials Group University of Wisconsin-Madison ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at mailto:Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at mailto:Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/5d6b9b02/attachment.htm
[Wien] LAPW2 crashed when running in parallel
Dear Lyudmila, On Nov 1, 2010, at 8:36 AM, Lyudmila V. Dobysheva wrote: 01 Nov 2010 02:56:47 Wei Xie wrote: We encountered some problem when running in parallel (K-point, MPI or both)--the calculations crashed at LAPW2. Note we had no problem running it in serial. This is a TiC example running Dear Wei, Isn't the error connected with spin-polarised - spin-UNpolarised cases? TiC is to be calculated unpolarised, as far as I know. TiC is non SP in the first example of UG, but we did it with SP here just for testing. We can run SP calculations in serial for TiC. 1. stdout (abridged) ... TiC.scf1up_1: No such file or directory. Was lapw1 really successfull? Thanks for your reminder. We checked and found that lapw1 was actually not successful either--there's no case.output1, case.output2, case.output?file in the case directory. My guess is that the computing nodes are not communicating well with the headnode, so that even the lapw1 finished ok, the output files are not written from computing nodes to the headnode. We are testing the communication now. 3. uplapw2.error Error in LAPW2 'LAPW2' - can't open unit: 18 'LAPW2' -filename: TiC.vspup 'LAPW2' - status: old form: formatted It looks like your initialization was done without spin-polarization, but runsp_lapw was run. But in this case lapw1 must also be unsuccessful (?). see my answers above. Your possible follow-up is appreciated in advance. Thanks, Wei Best regards, Lyudmila Dobysheva -- Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci. 426001 Izhevsk, ul.Kirova 132 RUSSIA -- Tel.:7(3412) 442118 (home), 218988(office), 250614(Fax) E-mail: lyu at otf.fti.udmurtia.su lyuka17 at mail.ru lyu at otf.pti.udm.ru http://fti.udm.ru/content/view/25/103/lang,english/ -- ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien