[Wien] Network problem caused by lapw1?

2009-12-23 Thread Peter Blaha
The new iterative diagonalization creates files called case.storeHinv.., where 
the inverse of H is
stored (one triangle of the matrix in single precision).

These files can be quite large (eg. for matrix size 3 the size of all 
Hinv-files (# of processors)
is 3600MB or 7200MB (real/complex), but on a balanced cluster they should be 
written/read in
100-200 seconds. It is created only once (in the second scf cycle), but read in 
all subsequent
iterative scf cycles.

Please note, that the method is usually so efficient, that one can run even a 
minimization with -it0:
min -j "run_lapw -it0"; i.e. one does not need to create it again!

Similar as with the vector files, you can use the SCRATCH variable, to direct 
these files to a local
scratch directory (eg. with 100 processors, each processor reads/writes only 
36MB !)


> I observe the cluster network dying for about 10 minutes when performing 
> calculation for a relatively large case that involves 256 cores and 
> InfiniBand. I use WIEN2k_09.2 (Release 29/9/2009) + ifort 11.0.074 + Intel 
> MKL 10.1.0.015 + MVAPICH2 and iterative diagonalization. The network dyes 
> always at the end of the second scf iteration iteration (most likely at the 
> end of lapw1). This did not occur in WIEN2k_08.3 (Release 18/9/2008) for the 
> same case and compiler settings. I know that the iterative diagonalization 
> has undergone some major changes between these two versions.
> 
> This actually does not interrupt the calculations and there is no sign of any 
> error, but it causes SGE demon to die on compute nodes with all consequences.
> 
> Did anyone experience a similar problem? What is differently in the behaviour 
> of lapw1 for the 2nd iteration that may cause the problem?
> 
> Thank you in advance and Happy Holidays.
> 
> Oleg Rubel
> 
> --
> Thunder Bay Regional Research Institute
> 290 Munro St, Thunder Bay, ON, P7A 7T1, Canada
> Homepage: http://www.tbrri.com/~orubel/
> ___
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
-
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-


[Wien] questions on WIENNCM example

2009-12-23 Thread arlonni
Dear all,
two questions:
1. in UO2-2k example, after ncmsymmtry, I can't get 16 sysmmtry operations, I 
have only 8 operations. which one is right, 16 or 8?
2. if I need to compare the energy of magnetic anisotropies, should I need to 
change .inncm in each case? is this right?
 
  Best regards,
Longhua Li
2009-12-23 

-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20091223/cbbe6c0a/attachment.htm>


[Wien] installation problem

2009-12-23 Thread Lyudmila Dobysheva
On Tuesday 22 December 2009 18:22:42 Dr Aruna Chatterjee wrote:
> I have successfully untarred
> and unzipped the package. But is facing with problem when Igave
> ./expand_lapw The message appears as
>  bash:./expand_lapw :/bin/csh:bad
> interpreter: No such file or directory

Ask your system administrator to make the redirection
csh -> /usr/bin/tcsh

Best regards
Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 442118 (home), 218988(office), 250614(Fax)
E-mail: lyuka17 at mail.ru
lyuka17 at gmail.com
lyu at otf.pti.udm.ru
lyu at otf.fti.udmurtia.su
http://fti.udm.ru/content/view/25/103/lang,english/
--


[Wien] some comments on parallel execution of wien2k

2009-12-23 Thread Sergiu Arapan
Dear wien2k users and developers,

I would like to post few comments on running parallel version of wien2k 
on a distributed memory cluster. I'm using the most recent version of 
wien2k (09.2) on a Linux-based cluster with 805 HP ProLiant DL140 G3 
nodes, each node consisting of Intel Xeon E5345 Quad Core Processor 2.33 
GHz, 4 MB Level 2 cache, interconnected by Next generation Infiniband 
interconnect. Operating system is CentOS 5 64-bit Linux and resource 
manager is SLURM. I compiled source code with Intel compilers (ifort 
10.1.017), Intel built OpenMPI (mpif90 1.2.7) and linked with MKL 
(10.0.1.014), FFTW (2.1.5) and corresponding OpenMPI libs.

My first comment concerns the implementation of the MPI on fine grain 
parallelization. Within the current version of wien2k, the module 
lap2w_mpi crashes if N_noneq_atoms (number of nonequivalent atoms in 
case.struc file) is not a multiple of N_cpus (number of processors to 
run lapw2_mpi). This strange behavior was reported in a recent post by 
Duy Le with the subject ?[Wien] MPI problem for LAPW2? 
(http://zeus.theochem.tuwien.ac.at/pipermail/wien/2009-September/012042.html). 
He noticed that for a system consisting of 21 (nonequivalent) atoms the 
program runs only for 3 or 7 cpus. He managed to cure the problem by 
setting lapw2_vector_split:$N_cpus, but without a reasonable 
explanation. However, one can get a hint by looking at lap2w source 
files and the output of lapw2_mpi. Let's consider, for example, the 
cd16te15sb.struct from $WIENROOT/example_struct_files, which describes a 
structure with 5 nonequivalent atoms. Let's run it on a computer node 
with 8 cpus with the following .machines file:
granularity:1
1:n246:8
lapw0:n246:8
extrafine:1

Here are some lines from the resulting case.dayfile:
 > lapw0 -p (19:41:09) starting parallel lapw0 at Tue Dec 22 19:41:09 
CET 2009
 .machine0 : 8 processors
mpirun --verbose -np 8 --hostfile .machine0 $WIENROOT/lapw0_mpi lapw0.def
Tue Dec 22 19:41:27 CET 2009 -> all processes done.
?..
 

 > lapw1 -c -p (19:41:28) starting parallel lapw1 at Tue Dec 22 19:41:28 
CET 2009
-> starting parallel LAPW1 jobs at Tue Dec 22 19:41:28 CET 2009
1 number_of_parallel_jobs
 .machine1 : 8 processors : weight 1
mpirun --verbose -np 8 --hostfile .machine1 $WIENROOT/lapw1c_mpi 
lapw1_1.def
waiting for all processes to complete
Tue Dec 22 19:48:26 CET 2009 -> all processes done.
?.
 

 > lapw2 -c -p (19:48:28) running LAPW2 in parallel mode
running parallel lapw2
mpirun --verbose -np 8 --hostfile .machine1 $WIENROOT/lapw2c_mpi 
lapw2_1.def 1
sleeping for 1 seconds
waiting for processes:
** LAPW2 crashed!
?.
 


The job crashed with the following error message:
[n246:15992] *** An error occurred in MPI_Comm_split
[n246:15992] *** on communicator MPI_COMM_WORLD
[n246:15992] *** MPI_ERR_ARG: invalid argument of some other kind
[n246:15992] *** MPI_ERRORS_ARE_FATAL (goodbye)

Now, if one take a look at case.output2_1_proc_n (n=1,2,..,7), one see 
the following header (here is the case of case.output2_1_proc_1):

init_parallel_2 1 8 1 8 2
MPI run on 8 processors in MPI_COMM_WORLD
8 processors in MPI_vec_COMM (atoms splitting)
1 processors in MPI_atoms_COMM (vector splitting)

myid= 1
myid_atm= 1
myid_vec= 1

time in recpr: 0.820

One can find the following lines in the lapw2.F source file (lines 
129-137):
#ifdef Parallel
write(6,*) 'MPI run on ',npe,' processors in MPI_COMM_WORLD'
write(6,*) ' ',npe_atm,' processors in MPI_vec_COMM (atoms splitting)'
write(6,*) ' ',npe_vec,' processors in MPI_atoms_COMM (vector splitting)'
write(6,*) ' myid= ',myid
write(6,*) ' myid_atm= ',myid
write(6,*) ' myid_vec= ',myid
write(6,*) ' '
#endif
which generate this output.

If I correctly understand, npe is the total number of cpus, npe_atm is 
the number of cpus for parallelization over atoms, and npe_vec the 
number of cpus for additional parallelization of the density over 
vectors (I think that MPI_vec_COMM and MPI_atoms_COMM should be swapped).

One can also find the following lines (306-311) in the l2main.F file:
! -
! START LOOP FOR ALL ATOMS
! -

non_equiv_loop: do jatom_pe=1,nat,npe_atm
jatom=jatom_pe+myid_atm

from which I can understand that the loop is over nonequivalent atoms 
nat with step npe_atom.

Now let's do some changes in the lapw2para to run lapw2_mpi on 5 cpus, 
and take a look at case.dayfile and case.output2_1_proc_1 files.
Here are lines from case.dayfile:
 > lapw0 -p (20:08:14) starting parallel lapw0 at Tue Dec 22 20:08:14 
CET 2009
Tue Dec 22 20:08:14 CET 2009 -> Setting up case Cd16Te15Sb for parallel 
execution
 .machine0 : 8 processors
mpirun --verbose -np 8 --hostfile .mac

[Wien] installation problem

2009-12-23 Thread Tarik Ouahrani





Dear  Dr.Arun Kumar Chatterjee
find in the web  the tar.gz file to solveyour problem, try to install it
with the command 
# rpm -ivh tcsh-6.15-1.fc8.x86_64.rpm
best regards
Tarik
  
_
Un avatar ? votre image ? Cr?ez votre mini-moi !
http://www.ilovemessenger.fr/minimize-me/
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20091223/762095aa/attachment.htm>


[Wien] some comments on parallel execution of wien2k

2009-12-23 Thread Duy Le
ion from developers.
>
> My second comment is that you do not need to connect through ssh to
> allocated processors on different computer nodes in order to run lapw1(c) or
> lapw2(c) (the case of parallelization over k-points). You can run your
> parallel processes by invoking mpirun.
> First, set up "setenv WIEN_MPIRUN 'mpirun -np _NP_ --hostfile _HOSTS_
> _EXEC_'? in $WIENROOT/parallel_options.
> Second, instead of line ?(cd $PWD;$t $exe ${def}_$loop.def;rm -f
> .lock_$lockfile[$p]) >>.time1_$loop &? in lapw1para (line 406) use the
> following two lines:
> ?set ttt=(`echo $mpirun | sed -e "s^_NP_^$number_per_job[$p]^" -e
> "s^_HOSTS_^.machine$p^" -e "s^_EXEC_^$WIENROOT/${exe} ${def}_$loop.def^"`)?
> and
> ?(cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]) >>.time1_$loop &?
>
> similar to mpi execution.
> In the same fashion, in lapw2para instead of line 314 ?(cd $PWD;$t $exe
> ${def}_${loop}.def $loop;rm -f .lock_$lockfile[$p]) >>.time2_$loop &? use
> the following 2 lines:
> ?set ttt=(`echo $mpirun | sed -e "s^_NP_^$number_per_job2[$loop]^" -e
> "s^_HOSTS_^.machine$mach[$loop]^" -e "s^_EXEC_^$WIENROOT/${exe}
> ${def}_$loop.def $loop^"`)?
> and
> ?(cd $PWD;$t $ttt $vector_split;rm -f .lock_$lockfile[$p]) >>.time2_$loop
> &?.
>
> I hope you will find these comments useful :) .
>
> Regards and Marry Christmas,
> Sergiu Arapan
>
>
>
> ___
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20091223/35a95ee6/attachment.htm>


[Wien] Network problem caused by lapw1?

2009-12-23 Thread Oleg Rubel
Thank you very much for the hint.

I found in my $SCRATCH directory 256 *storeHinv* files 221MB each. My mistake 
was to use a nfs mounted directory on a head node as $SCRATCH. I changed it now 
to a local directory.

Thank you once again,

Oleg

>>> Peter Blaha  12/23/09 2:17 AM >>>
The new iterative diagonalization creates files called case.storeHinv.., where 
the inverse of H is
stored (one triangle of the matrix in single precision).

These files can be quite large (eg. for matrix size 3 the size of all 
Hinv-files (# of processors)
is 3600MB or 7200MB (real/complex), but on a balanced cluster they should be 
written/read in
100-200 seconds. It is created only once (in the second scf cycle), but read in 
all subsequent
iterative scf cycles.

Please note, that the method is usually so efficient, that one can run even a 
minimization with -it0:
min -j "run_lapw -it0"; i.e. one does not need to create it again!

Similar as with the vector files, you can use the SCRATCH variable, to direct 
these files to a local
scratch directory (eg. with 100 processors, each processor reads/writes only 
36MB !)


> I observe the cluster network dying for about 10 minutes when performing 
> calculation for a relatively large case that involves 256 cores and 
> InfiniBand. I use WIEN2k_09.2 (Release 29/9/2009) + ifort 11.0.074 + Intel 
> MKL 10.1.0.015 + MVAPICH2 and iterative diagonalization. The network dyes 
> always at the end of the second scf iteration iteration (most likely at the 
> end of lapw1). This did not occur in WIEN2k_08.3 (Release 18/9/2008) for the 
> same case and compiler settings. I know that the iterative diagonalization 
> has undergone some major changes between these two versions.
> 
> This actually does not interrupt the calculations and there is no sign of any 
> error, but it causes SGE demon to die on compute nodes with all consequences.
> 
> Did anyone experience a similar problem? What is differently in the behaviour 
> of lapw1 for the 2nd iteration that may cause the problem?
> 
> Thank you in advance and Happy Holidays.
> 
> Oleg Rubel
> 
> --
> Thunder Bay Regional Research Institute
> 290 Munro St, Thunder Bay, ON, P7A 7T1, Canada
> Homepage: http://www.tbrri.com/~orubel/
> ___
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
-
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-
___
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien