Dear WIEN2k developers and users,

We are trying to install WIEN2k 10.1 on a computing cluster and plan to 
calculate some big system (over 60 atoms/cell) with it.  We got no error 
message during the compilation, and testing with the three examples (Fccni, TiC 
and TiO2) in serial finished fast and correctly. However we failed in the 
parallel (k-point and/ or MPI) mode. Therefore, we write here to this email 
list hoping someone can offer us some help. Below's the details of our system, 
compilers, libraries, compiler options, linking flags and testing. 

1. System : SUSE Linux Enterprise Server 10 (x86_64), Intel Xeon X5355 quad 
core processors (Intel 64),  2 GB memory per core, DDR 4X InfiniBand, PBS 
Professional queuing system. 

2. compiler/libraries: ifort and icc of Intel 11.1/046, mpiifort of Intel MPI 
3.2.0.011, BLAS, LAPACK and scaLAPCK of Intel MKL 10.2, and fftw 2.1.5 
(compiled with "--enable-mpi" switch at /home/user/fftw-2.1.5)
The environment was configured by source in bash_profile:
source /usr/local/intel/Compiler/11.1/046/bin/ifortvars.sh intel64           
#ifort
source /usr/local/intel/Compiler/11.1/046/mkl/tools/environment/mklvarsem64t.sh 
        #mkl
source /usr/local/intel/impi/3.2.0.011/bin64/mpivars.sh             #mpi
Their bin, library, and include directory were all sourced in bash_profile as 
well. 

3. Compiler options: 
For serial:
 O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback
 L   Linker Flags:            $(FOPT) 
-L/opt/intel/Compiler/11.1/046/mkl/lib/em64t -pthread
 P   Preprocessor flags       '-DParallel'
 R   R_LIB (LAPACK+BLAS):     -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -openmp -lpthread -lguide

For parallel:
Shared Memory Architecture: no; 
Remote shell: ssh (password-less log-in enabled);
RP  -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 
/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
-Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
-lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
-L/home/user/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
 FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback
 MP  MPIRUN commando        : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

Note: We used all WIEN2k recommended options/flags except RP for which we used 
those from Intel MKL linking Advisor by specifying dynamic, 32-bit (lp64) and 
multi-threaded etc. We're not sure if these are correct (especially the 
integers length) and would like to here your suggestion. You may find our 
processors' specifications are at http://ark.intel.com/Product.aspx?id=28035 . 

4. Testing
4.1 Inputs
We used userconfig_lapw to set the user environment (especially, scratch 
director is set to be /scratch), and then performed the testing using the Fccni 
example downloaded from the WIEN2k website. 

We first ran a spin-polarized calculation in serial using the recommended 
parameters from the User's Guide for the initialization. The calculation 
finished without problem quickly and the results matched the downloaded outputs 
well. We then ran save_lapw and clean_lapw so that we can use the same set of 
input files to test parallelization. We wrote a submission script to create the 
.machines file and calculate the number of processors allocated ($nprocs) on 
the fly and start the calculation with: mpirun -np $nprocs runsp_lapw -p -ec 
0.0001 -cc 0.0001. We enabled hybrid parallelization (i.e., both k-point and 
MPI) in this case.

The .machines file created reads: 
1:r1i0n0:8
1:r1i0n1:8
lapw0: r1i0n0:8 r1i0n1:8 
lapw1: r1i0n0:8 r1i0n1:8 
lapw2: r1i0n0:8 r1i0n1:8 
granularity:1
extrafine:1

In this example we were allocated two nodes (r1i0n0 and r1i0n1) by PBS, each 
with 8 cores  (each node is made of two quad-core CPUs which together make 8 
cores). The first two lines are for k-point and the next three for MPI (for 
lapw0, lapw1, lapw2, respectively). 

4.2 Outputs
The job was killed within one minute outputting error messages like:
~ cat aU_SOC.e799326
rm: cannot remove `fccni.vspup': No such file or directory
rm: cannot remove `fccni.vspdn': No such file or directory
rm: cannot remove `fccni.vnsup': No such file or directory
rm: cannot remove `fccni.vnsdn': No such file or directory
/tmp/pbs.799326.service2/sh.piTkRT: No such file or directory.
/tmp/pbs.799326.service2/sh.ygkvzW: No such file or directory.
/tmp/pbs.799326.service2/sh.i4xOi2: No such file or directory.
mv: cannot stat `.tmp': No such file or directory
foreach: No match.
/tmp/pbs.799326.service2/sh.m3zD88: No such file or directory.
/tmp/pbs.799326.service2/sh.xgo6Fb: No such file or directory.
/tmp/pbs.799326.service2/sh.zyICya: No such file or directory.
/tmp/pbs.799326.service2/sh.fI8qUa: No such file or directory.
/tmp/pbs.799326.service2/sh.cghNSa: No such file or directory.
foreach: No match.
mv: cannot stat `.tmp': No such file or directory
rm: No match.
rm: cannot remove `fccni.vns': No such file or directory
rm: cannot remove `fccni.vnsup': No such file or directory
rm: cannot remove `fccni.vnsdn': No such file or directory
rm: cannot remove `fccni.vsp': No such file or directory
rm: cannot remove `fccni.vspdn': No such file or directory
sed: can't read .machinetmp22: No such file or directory
rm: cannot remove `.machinetmp': No such file or directory
machine_i: Subscript out of range.
cut: .machine0: No such file or directory
rm: cannot remove `.machinetmp22': No such file or directory
sed: can't read .machinetmp: No such file or directory
rm: cannot remove `.machinetmp': No such file or directory
mv: cannot stat `.tmp': No such file or directory
 LAPW0 END
 LAPW0 END
@: Expression Syntax.

It seemed that the job stopped when executing LAPW0 because WIEN2k couldn't 
find/move/delete some files. 

We have tried a couple of different compilations (e.g., using exactly what 
WIEN2k recommended for RP) but these errors persist. We have also searched the 
WIEN2k mail list but didn't find any related post. 

Does anyone have any idea on this? Your comments will be highly appreciated! 

Thanks,
Wei 
-------------------------------------------
Computational Materials Group
University of Wisconsin-Madison 
209 MS&E Bldg, 1509 University Ave 
Madison, WI 53706-1595
Office: (608)262-2088 
Email: wxie4 at wisc.edu
Web: http://matmodel.engr.wisc.edu/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101011/1a5ee91c/attachment.htm>

Reply via email to