[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Maxim Rakitin
Hi,

It looks like Intel's mpirun doesn't have '-machinefile' option. Instead 
of this it has '-hostfile' option (form here: 
http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt).

Try 'mpirun -h' for information about options and apply appropriate.

Best regards,
Maxim Rakitin
email: rms85 at physics.susu.ac.ru
web: http://www.susu.ac.ru


01.11.2010 4:56, Wei Xie ?:
 Dear all WIEN2k community members:

 We encountered some problem when running in parallel (K-point, MPI or 
 both)--the calculations crashed at LAPW2. Note we had no problem 
 running it in serial. We have tried to diagnose the problem, recompile 
 the code with difference options and test with difference cases and 
 parameters based on similar problems reported on the mail list, but 
 the problem persists. So we write here hoping someone can offer us 
 some suggestion. We have attached related files below for your 
 reference. Your replies are appreciated in advance!

 This is a TiC example running in both Kpoint and MPI parallel on two 
 nodes /r1i0n0/ and /r1i0n1/ (8cores/node):

 *1. **stdout **(abridged) *
 MPI: invalid option -machinefile
 real0m0.004s
 user0m0.000s
 sys0m0.000s
 ...
 MPI: invalid option -machinefile
 real0m0.003s
 user0m0.000s
 sys0m0.004s
 TiC.scf1up_1: No such file or directory.

 LAPW2 - Error. Check file lapw2.error
 cp: cannot stat `.in.tmp': No such file or directory
 rm: cannot remove `.in.tmp': No such file or directory
 *rm: cannot remove `.in.tmp1': No such file or directory*
 *
 *
 *2. TiC.dayfile (abridged) *
 ...
 start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go)
 cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go)

lapw0 -p(16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 
 MDT 2010
  .machine0 : 16 processors
 invalid local arg: -machinefile

 0.436u 0.412s 0:04.63 18.1%0+0k 2600+0io 1pf+0w
lapw1  -up -p (16:25:12) starting parallel lapw1 at Sun Oct 31 
 16:25:12 MDT 2010
 -  starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010
 running LAPW1 in parallel mode (using .machines)
 2 number_of_parallel_jobs
  r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) 
  r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) 
  r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)Summary 
 of lapw1para:
r1i0n0 k=0 user=0 wallclock=0
r1i0n1 k=0 user=0 wallclock=0
 ...
 0.116u 0.316s 0:10.48 4.0%0+0k 0+0io 0pf+0w
lapw2 -up -p (16:25:34) running LAPW2 in parallel mode
 **  LAPW2 crashed!
 0.032u 0.104s 0:01.13 11.5%0+0k 82304+0io 8pf+0w
 error: command   /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def   failed

 *3. uplapw2.error *
 Error in LAPW2
  'LAPW2' - can't open unit: 18
  'LAPW2' -filename: TiC.vspup
  'LAPW2' -  status: old  form: formatted
 **  testerror: Error in Parallel LAPW2

 *4. .machines*
 #
 1:r1i0n0:8
 1:r1i0n1:8
 lapw0:r1i0n0:8 r1i0n1:8
 granularity:1
 extrafine:1

 *5. compilers, MPI and options*
 Intel Compilers  and MKL 11.1.046
 Intel MPI 3.2.0.011

 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:LDFLAGS:$(FOPT) 
 -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread 
 -lmkl_core -openmp -lpthread -lguide
 current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -lmkl_scalapack_lp64 
 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
 -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

 Best regards,
 Wei Xie
 Computational Materials Group
 University of Wisconsin-Madison


 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/e1463e23/attachment.htm


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Wei Xie
Hi Maxim,

Thanks for your reply! 
We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but the problem 
persists. The only difference is that stdout changes to ''? MPI: invalid option 
-hostfile ?''.

Thanks,
Wei


On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote:

 Hi,
 
 It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of 
 this it has '-hostfile' option (form here: 
 http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt).
 
 Try 'mpirun -h' for information about options and apply appropriate.
 Best regards,
Maxim Rakitin
email: rms85 at physics.susu.ac.ru
web: http://www.susu.ac.ru
 
 01.11.2010 4:56, Wei Xie ?:
 
 Dear all WIEN2k community members:
 
 We encountered some problem when running in parallel (K-point, MPI or 
 both)--the calculations crashed at LAPW2. Note we had no problem running it 
 in serial. We have tried to diagnose the problem, recompile the code with 
 difference options and test with difference cases and parameters based on 
 similar problems reported on the mail list, but the problem persists. So we 
 write here hoping someone can offer us some suggestion. We have attached 
 related files below for your reference. Your replies are appreciated in 
 advance! 
 
 This is a TiC example running in both Kpoint and MPI parallel on two nodes 
 r1i0n0 and r1i0n1 (8cores/node):
 
 1. stdout (abridged) 
 MPI: invalid option -machinefile
 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s
 ...
 MPI: invalid option -machinefile
 real 0m0.003s
 user 0m0.000s
 sys 0m0.004s
 TiC.scf1up_1: No such file or directory.
 
 LAPW2 - Error. Check file lapw2.error
 cp: cannot stat `.in.tmp': No such file or directory
 rm: cannot remove `.in.tmp': No such file or directory
 rm: cannot remove `.in.tmp1': No such file or directory
 
 2. TiC.dayfile (abridged) 
 ...
 start  (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go)
 cycle 1  (Sun Oct 31 16:25:06 MDT 2010)  (40/99 to go)
 
lapw0 -p (16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT 
  2010
  .machine0 : 16 processors
 invalid local arg: -machinefile
 
 0.436u 0.412s 0:04.63 18.1% 0+0k 2600+0io 1pf+0w
lapw1  -up -p(16:25:12) starting parallel lapw1 at Sun Oct 31 
  16:25:12 MDT 2010
 -  starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010
 running LAPW1 in parallel mode (using .machines)
 2 number_of_parallel_jobs
  r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)  r1i0n1 
 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1)  r1i0n0 r1i0n0 
 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)Summary of lapw1para:
r1i0n0  k=0  user=0  wallclock=0
r1i0n1  k=0  user=0  wallclock=0
 ...
 0.116u 0.316s 0:10.48 4.0% 0+0k 0+0io 0pf+0w
lapw2 -up -p   (16:25:34) running LAPW2 in parallel mode
 **  LAPW2 crashed!
 0.032u 0.104s 0:01.13 11.5% 0+0k 82304+0io 8pf+0w
 error: command   /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def   failed
 
 3. uplapw2.error 
 Error in LAPW2
  'LAPW2' - can't open unit: 18   
  
  'LAPW2' -filename: TiC.vspup
  
  'LAPW2' -  status: old  form: formatted 
  
 **  testerror: Error in Parallel LAPW2
 
 4. .machines
 #
 1:r1i0n0:8
 1:r1i0n1:8
 lapw0:r1i0n0:8 r1i0n1:8 
 granularity:1
 extrafine:1
 
 5. compilers, MPI and options
 Intel Compilers  and MKL 11.1.046
 Intel MPI 3.2.0.011
 
 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -openmp -lpthread -lguide
 current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -lmkl_scalapack_lp64 
 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
 -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
 
 Best regards,
 Wei Xie
 Computational Materials Group
 University of Wisconsin-Madison
 
 
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101031/2ce15505/attachment.htm


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Maxim Rakitin
 mailing list
 Wien at zeus.theochem.tuwien.ac.at mailto:Wien at 
 zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/4a99021e/attachment.htm


[Wien] Gd5Si4 calculations freeze

2010-11-01 Thread Peter Blaha
When the EF WARNING occurs during the scf cycle, but not at the end, 
this is no problem.

However, your lines:
  lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
  lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

are suspicious. You get only 25% of the CPU. lapw1 should finish after 
14 min, but took almost one hour.
Either 3 other jobs are running on this single cpu machine, or you run 
out of memory (you have very large system times !) and the machine 
pages. (Check with the   top  command during execution.

If memory is all used, either reduce RKMAX or go to a different machine.

 I have tried again to run a regular LDA calculations. Again as in the
 LDA+U case, lapw2 takes very long time (calculating like 2 days
 already), and there is an interesting error:

 dayfile
 ***

 0.461365296892454 211.90982969 212.16483888
 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max
 0.461365291265349
 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
 lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

 cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go)

 start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go)

 ***
 I have seen it before for this compound during earlier trials. In
 general, somehow this compound Gd5Si4 takes maybe 10 times longer to go
 through the initial lapw0, lapw1 cycles than other transition metal
 containing compounds I was calculating with about 10 times finer k-mesh!
 And sizes of the unit cells are roughly comparable.

 I have attached a scf2 file from this cycle.

 Gd5Si4_np_2nd.scf2
 ***
 :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5
 Bandranges (emin - emax) and occupancy:
 :BAN00204: 204 0.457469 0.458922 2.
 :BAN00205: 205 0.458380 0.459246 2.
 :BAN00206: 206 0.458579 0.459997 2.
 :BAN00207: 207 0.459122 0.460319 2.
 :BAN00208: 208 0.459122 0.460319 2.
 :BAN00209: 209 0.459929 0.460891 2.
 :BAN00210: 210 0.459944 0.460891 2.
 :BAN00211: 211 0.460551 0.461480 1.89644048
 :BAN00212: 212 0.460552 0.461646 1.73748489
 :BAN00213: 213 0.460552 0.462067 0.30738641
 :BAN00214: 214 0.461259 0.462270 0.05869553
 :BAN00215: 215 0.461611 0.462536 0.
 :BAN00216: 216 0.461611 0.462545 0.
 :BAN00217: 217 0.461923 0.462744 0.
 :BAN00218: 218 0.461923 0.462895 0.
 :BAN00219: 219 0.462510 0.463225 0.
 Energy to separate low and high energystates: 0.02312


 :NOE : NUMBER OF ELECTRONS = 424.000

 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137
 ***

 Looking forward to your advises and ideas,
 Thank you,
 Volodymyr



 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
Peter Blaha
Inst.Materials Chemistry
TU Vienna
Getreidemarkt 9
A-1060 Vienna
Austria
+43-1-5880115671


[Wien] Gd5Si4 calculations freeze

2010-11-01 Thread Maxim Rakitin
Dear Prof. Blaha,

Where can I find timings for parallel running programs? In my 
calculations I usually get such strings in case.dayfile:

Summary of lapw1para:
node-09-07k=0 user=0  wallclock=0
0.692u 1.028s 14:50.98 0.1% 0+0k 0+0io 0pf+0w

But this 'time' command output is for lapw1para script, not for actual 
lapw1c_mpi programs. The same situation is for lapw0/2.

Thank you.

Best regards,
Maxim Rakitin
Email: rms85 at physics.susu.ac.ru
Web: http://www.susu.ac.ru


01.11.2010 11:48, Peter Blaha ?:
 When the EF WARNING occurs during the scf cycle, but not at the end, 
 this is no problem.

 However, your lines:
  lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
  lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

 are suspicious. You get only 25% of the CPU. lapw1 should finish after 
 14 min, but took almost one hour.
 Either 3 other jobs are running on this single cpu machine, or you run 
 out of memory (you have very large system times !) and the machine 
 pages. (Check with the   top  command during execution.

 If memory is all used, either reduce RKMAX or go to a different machine.

 I have tried again to run a regular LDA calculations. Again as in the
 LDA+U case, lapw2 takes very long time (calculating like 2 days
 already), and there is an interesting error:

 dayfile
 ***

 0.461365296892454 211.90982969 212.16483888
 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max
 0.461365291265349
 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
 lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

 cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go)

 start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go)

 ***
 I have seen it before for this compound during earlier trials. In
 general, somehow this compound Gd5Si4 takes maybe 10 times longer to go
 through the initial lapw0, lapw1 cycles than other transition metal
 containing compounds I was calculating with about 10 times finer k-mesh!
 And sizes of the unit cells are roughly comparable.

 I have attached a scf2 file from this cycle.

 Gd5Si4_np_2nd.scf2
 ***
 :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5
 Bandranges (emin - emax) and occupancy:
 :BAN00204: 204 0.457469 0.458922 2.
 :BAN00205: 205 0.458380 0.459246 2.
 :BAN00206: 206 0.458579 0.459997 2.
 :BAN00207: 207 0.459122 0.460319 2.
 :BAN00208: 208 0.459122 0.460319 2.
 :BAN00209: 209 0.459929 0.460891 2.
 :BAN00210: 210 0.459944 0.460891 2.
 :BAN00211: 211 0.460551 0.461480 1.89644048
 :BAN00212: 212 0.460552 0.461646 1.73748489
 :BAN00213: 213 0.460552 0.462067 0.30738641
 :BAN00214: 214 0.461259 0.462270 0.05869553
 :BAN00215: 215 0.461611 0.462536 0.
 :BAN00216: 216 0.461611 0.462545 0.
 :BAN00217: 217 0.461923 0.462744 0.
 :BAN00218: 218 0.461923 0.462895 0.
 :BAN00219: 219 0.462510 0.463225 0.
 Energy to separate low and high energystates: 0.02312


 :NOE : NUMBER OF ELECTRONS = 424.000

 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137
 ***

 Looking forward to your advises and ideas,
 Thank you,
 Volodymyr



 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/2d206f58/attachment.htm


[Wien] problem with initso

2010-11-01 Thread Md. Fhokrul Islam

Hi Prof Blaha,

I am having some touble initializing job with spin-orbit coupling. I am 
running a spin polarized
job for a 96 atom surface supercell. After converging calculation without 
spin-orbit I tried to
initialize job with initso using both versions 10.1 and 9.2. While it works for 
some cases but for most
cases I got the same error message shown below. I have done spin-obit 
calculation before for smaller 
systems and never had any problem with initso. So I would appreciate if you 
could let me know how 
to fix this problem.


Do you have a spinpolarized case (and want to run symmetso) ? (y/N)y
   90.090.01.57079632679490  T
   1.00   0.000E+000  0.000E+000
  6.123233995736766E-017   1.00   0.000E+000
  6.123233995736766E-017  6.123233995736766E-017   1.00 
forrtl: severe (64): input conversion error, unit 21, file 
/home/eisfh/WIEN2k/Surface/MnSurf/fMn110/so_test/fMn110/fMn110.struct_so
Image PC   RoutineLine  
Source 
symmetso   004B65C1  Unknown   Unknown  Unknown
symmetso   004B5595  Unknown   Unknown  Unknown
symmetso   0048427A  Unknown   Unknown  Unknown
symmetso   0047AEF2  Unknown   Unknown  Unknown
symmetso   0047A721  Unknown   Unknown  Unknown
symmetso   0043F396  Unknown   Unknown  Unknown
symmetso   0041AC99  Unknown   Unknown  Unknown
symmetso   00405640  Unknown   Unknown  Unknown
symmetso   0040340C  Unknown   Unknown  Unknown
libc.so.6  0033E881D994  Unknown   Unknown  Unknown
symmetso   00403319  Unknown   Unknown  Unknown
13.330u 6.481s 0:35.97 55.0%0+0k 0+0io 0pf+0w
error: command   /home/eisfh/Wien2k_09.2/symmetso symmetso.def   failed


Thanks,
Fhokrul

  
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/eb713495/attachment.htm


[Wien] Gd5Si4 calculations freeze

2010-11-01 Thread Peter Blaha
case.output1_1  or similar (depending on spin,...)

Am 01.11.2010 09:40, schrieb Maxim Rakitin:
 Dear Prof. Blaha,

 Where can I find timings for parallel running programs? In my
 calculations I usually get such strings in case.dayfile:

 Summary of lapw1para:
 node-09-07 k=0 user=0 wallclock=0
 0.692u 1.028s 14:50.98 0.1% 0+0k 0+0io 0pf+0w

 But this 'time' command output is for lapw1para script, not for actual
 lapw1c_mpi programs. The same situation is for lapw0/2.

 Thank you.

 Best regards,
 Maxim Rakitin
 Email:rms85 at physics.susu.ac.ru
 Web:http://www.susu.ac.ru


 01.11.2010 11:48, Peter Blaha ?:
 When the EF WARNING occurs during the scf cycle, but not at the end,
 this is no problem.

 However, your lines:
  lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
  lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

 are suspicious. You get only 25% of the CPU. lapw1 should finish after
 14 min, but took almost one hour.
 Either 3 other jobs are running on this single cpu machine, or you run
 out of memory (you have very large system times !) and the machine
 pages. (Check with the top command during execution.

 If memory is all used, either reduce RKMAX or go to a different machine.

 I have tried again to run a regular LDA calculations. Again as in the
 LDA+U case, lapw2 takes very long time (calculating like 2 days
 already), and there is an interesting error:

 dayfile
 ***

 0.461365296892454 211.90982969 212.16483888
 lapw2 (15:34:45) WARNING: EF not accurate, new emin,emax,NE-min,NE-max
 0.461365291265349
 lapw1 (14:40:19) 800.850u 14.192s 54:24.88 24.9% 0+0k 0+0io 0pf+0w
 lapw0 (14:37:49) 34.906u 4.344s 2:28.51 26.4% 0+0k 0+0io 0pf+0w

 cycle 1 (Thu Oct 28 14:37:49 EDT 2010) (1/99 to go)

 start (Thu Oct 28 14:37:49 EDT 2010) with lapw0 (1/99 to go)

 ***
 I have seen it before for this compound during earlier trials. In
 general, somehow this compound Gd5Si4 takes maybe 10 times longer to go
 through the initial lapw0, lapw1 cycles than other transition metal
 containing compounds I was calculating with about 10 times finer k-mesh!
 And sizes of the unit cells are roughly comparable.

 I have attached a scf2 file from this cycle.

 Gd5Si4_np_2nd.scf2
 ***
 :GMA : POTENTIAL AND CHARGE CUT-OFF 12.00 Ry**.5
 Bandranges (emin - emax) and occupancy:
 :BAN00204: 204 0.457469 0.458922 2.
 :BAN00205: 205 0.458380 0.459246 2.
 :BAN00206: 206 0.458579 0.459997 2.
 :BAN00207: 207 0.459122 0.460319 2.
 :BAN00208: 208 0.459122 0.460319 2.
 :BAN00209: 209 0.459929 0.460891 2.
 :BAN00210: 210 0.459944 0.460891 2.
 :BAN00211: 211 0.460551 0.461480 1.89644048
 :BAN00212: 212 0.460552 0.461646 1.73748489
 :BAN00213: 213 0.460552 0.462067 0.30738641
 :BAN00214: 214 0.461259 0.462270 0.05869553
 :BAN00215: 215 0.461611 0.462536 0.
 :BAN00216: 216 0.461611 0.462545 0.
 :BAN00217: 217 0.461923 0.462744 0.
 :BAN00218: 218 0.461923 0.462895 0.
 :BAN00219: 219 0.462510 0.463225 0.
 Energy to separate low and high energystates: 0.02312


 :NOE : NUMBER OF ELECTRONS = 424.000

 :FER : F E R M I - ENERGY(TETRAH.M.)= 0.46137
 ***

 Looking forward to your advises and ideas,
 Thank you,
 Volodymyr



 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien



 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
Peter Blaha
Inst.Materials Chemistry
TU Vienna
Getreidemarkt 9
A-1060 Vienna
Austria
+43-1-5880115671


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Lyudmila V. Dobysheva
01 Nov 2010 02:56:47 Wei Xie wrote:
 We encountered some problem when running in parallel (K-point, MPI or
  both)--the calculations crashed at LAPW2. Note we had no problem running
  it in serial.
 This is a TiC example running

Dear Wei,

Isn't the error connected with spin-polarised - spin-UNpolarised cases? TiC is 
to be calculated unpolarised, as far as I know.
 1. stdout (abridged)
...
 TiC.scf1up_1: No such file or directory.

Was lapw1 really successfull?

 3. uplapw2.error
 Error in LAPW2
  'LAPW2' - can't open unit: 18
  'LAPW2' -filename: TiC.vspup
  'LAPW2' -  status: old  form: formatted

It looks like your initialization was done without spin-polarization, but 
runsp_lapw was run. But in this case lapw1 must also be unsuccessful (?).

Best regards,
Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 442118 (home), 218988(office), 250614(Fax)
E-mail: lyu at otf.fti.udmurtia.su
lyuka17 at mail.ru
lyu at otf.pti.udm.ru
http://fti.udm.ru/content/view/25/103/lang,english/
--


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Wei Xie
 current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -openmp -lpthread -lguide
 current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -lmkl_scalapack_lp64 
 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
 -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
 
 Best regards,
 Wei Xie
 Computational Materials Group
 University of Wisconsin-Madison
 
 
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 
 
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/1e2e177e/attachment.htm


[Wien] what should be the best configuration of the system for DFT calculation of ceramics and polymers

2010-11-01 Thread shamik chakrabarti
Dear Peter Blaha Sir and wien2k users,

 We want to do DFT
calculation by wien2k for around *250 atoms/super cell* and we *also want to
do simulation of polymer through* *material studio*. My question is what
should be the best possible configuration for a server (computer) which can
serve this purpose?.we initially do not want clusterring of many nodes
rather we want a compact server which will contain say atleast 2 quad core
processors. In terms of configuration we mainly need:

(1) How many quad core processors we should use? (if it is possible to use
more than 2 quad core processor in a compact server!)
(2) If possible please suggest the name of the processor which can serve our
purpose the best ( Example: Intel Xeon Processor W3520, 2.66 GHz)
(3) What should be the RAM we should be opted for?
(4) Is there any other requirements except the above mentioned things we
should consider?

Any response will be very helpful for us. Thanks in advance.

with best regards,

-- 
Shamik Chakrabarti
Research Scholar
Dept. of Physics  Meteorology
Material Processing  Solid State Ionics Lab
IIT Kharagpur
Kharagpur 721302
INDIA
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/3516b423/attachment.htm


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Maxim Rakitin
:01.13 11.5%0+0k 82304+0io 8pf+0w
 error: command   /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def   
 failed

 *3. uplapw2.error *
 Error in LAPW2
  'LAPW2' - can't open unit: 18
  'LAPW2' -filename: TiC.vspup
  'LAPW2' -  status: old  form: formatted
 **  testerror: Error in Parallel LAPW2

 *4. .machines*
 #
 1:r1i0n0:8
 1:r1i0n1:8
 lapw0:r1i0n0:8 r1i0n1:8
 granularity:1
 extrafine:1

 *5. compilers, MPI and options*
 Intel Compilers  and MKL 11.1.046
 Intel MPI 3.2.0.011

 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
 -traceback
 current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
 -traceback
 current:LDFLAGS:$(FOPT) 
 -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread 
 -lmkl_core -openmp -lpthread -lguide
 current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
 -lmkl_scalapack_lp64 
 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
 -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

 Best regards,
 Wei Xie
 Computational Materials Group
 University of Wisconsin-Madison


 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at 
 mailto:Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at mailto:Wien at 
 zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/5d6b9b02/attachment.htm


[Wien] LAPW2 crashed when running in parallel

2010-11-01 Thread Wei Xie
Dear Lyudmila,

On Nov 1, 2010, at 8:36 AM, Lyudmila V. Dobysheva wrote:

 01 Nov 2010 02:56:47 Wei Xie wrote:
 We encountered some problem when running in parallel (K-point, MPI or
 both)--the calculations crashed at LAPW2. Note we had no problem running
 it in serial.
 This is a TiC example running
 
 Dear Wei,
 
 Isn't the error connected with spin-polarised - spin-UNpolarised cases? TiC 
 is 
 to be calculated unpolarised, as far as I know.
TiC is non SP in the first example of UG, but we did it with SP here just for 
testing. We can run SP calculations in serial for TiC.
 1. stdout (abridged)
 ...
 TiC.scf1up_1: No such file or directory.
 
 Was lapw1 really successfull?
Thanks for your reminder. We checked and found that lapw1 was actually not 
successful either--there's no case.output1, case.output2, case.output?file in 
the case directory. My guess is that the computing nodes are not communicating 
well with the headnode, so that even the lapw1 finished ok, the output files 
are not written from computing nodes to the headnode. We are testing the 
communication now. 
 
 3. uplapw2.error
 Error in LAPW2
 'LAPW2' - can't open unit: 18
 'LAPW2' -filename: TiC.vspup
 'LAPW2' -  status: old  form: formatted
 
 It looks like your initialization was done without spin-polarization, but 
 runsp_lapw was run. But in this case lapw1 must also be unsuccessful (?).
see my answers above.

Your possible follow-up is appreciated in advance.

Thanks,
Wei
 
 Best regards,
 Lyudmila Dobysheva
 --
 Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
 426001 Izhevsk, ul.Kirova 132
 RUSSIA
 --
 Tel.:7(3412) 442118 (home), 218988(office), 250614(Fax)
 E-mail: lyu at otf.fti.udmurtia.su
lyuka17 at mail.ru
lyu at otf.pti.udm.ru
 http://fti.udm.ru/content/view/25/103/lang,english/
 --
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien