Works with 5.2.0 (Cray PrgEnv-gnu, fortran 4.9, Cray libsci) with -ndiag > 1, 
-ntg > 1. Thanks! Lesson learned…

Kane


> On 21 Oct 2015, at 17:27, Paolo Giannozzi <p.gianno...@gmail.com> wrote:
> 
> Unless you need new developments that are available in the svn version only, 
> please try if it works with the 5.2.0 version. We just found a problem (also 
> affecting v.5.2.1) with "task groups" that may lead to strange crashes.
> 
> Paolo
> 
> On Wed, Oct 21, 2015 at 11:04 AM, Kane O'Donnell <kane.odonn...@gmail.com 
> <mailto:kane.odonn...@gmail.com>> wrote:
> 
> Hi all,
> 
> Wondering if I can get some help trying to diagnose a crash. I’m running the 
> SVN latest on a Cray XC40 (Magnus - 
> https://www.pawsey.org.au/our-systems/magnus-technical-specifications/ 
> <https://www.pawsey.org.au/our-systems/magnus-technical-specifications/>). 
> Usually no problems, but I have difficulties getting the attached slab 
> calculation to run past the first few davidson diagonalizations. It’s a 3x3 
> c-oriented slab of Na3Bi, the minimum I can use to capture a certain 
> adsorbate reconstruction (this just the bare slab). It’s only a 72 atom cell 
> but Bi has a lot of electrons (I think there’s about ~800 electrons and ~900 
> bands). I have spin-orbit coupling switched on (important for this solid), 
> and I have been able to do calculations on the smaller unit cell using the 
> library pseudopotentials listed in the species block. Calculations on systems 
> of this size (e.g. O(1000) electrons, bands) are routine on Magnus, so I 
> think I’m probably just doing something stupid but can’t seem to figure it 
> out.
> 
> Typical run conditions are with 384 processors (16 nodes, 24 cores), with -nk 
> 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory crash 
> just as the code reports it is allocating random wf’s at the beginning. From 
> 16 nodes and upwards, the crashes happen around the diagonalization step. 
> Switching to CG works but the slowdown is astronomical (~10000 seconds per 
> SCF step, not feasible for a relaxation). A typical output is attached. With 
> -ndiag > 1, the error is “problems computing cholesky”, with -ndiag = 1, the 
> error is "S matrix not positive definite”, both from cdiaghg. A search of the 
> forums suggests this issue comes up every now and then on wildly different 
> systems and is usually blamed on the user/compiler/lapack/blas/scalapack. So, 
> details: QE was compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) 
> against the Cray libsci (includes fftw, scalapack, etc), with:
> 
> ./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc 
> 
> and all tests are passed with no problems.
> 
> Any ideas? Let me know if there is any further information necessary.
> 
> Best regards,
> 
> Kane
> Kane O'Donnell
> Postdoctoral Research Fellow | Department of Physics, Astronomy and Medical 
> Radiation Science
> 
> Curtin University
> Tel | +61 8 9266 1381 <tel:%2B61%208%209266%201381> 
> Fax | +61 8 9266 2377 <tel:%2B61%208%209266%202377>  
> 
> Email | kane.odonn...@curtin.edu.au <>
> 
> 
> 
> Curtin University is a trademark of Curtin University of Technology
> CRICOS Provider Code 00301J
> 
> &control
>   calculation = 'relax',
>   title = '',
>   outdir = './',
>   prefix = 'Na3Bi_331',
>   pseudo_dir = '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/',
>   wf_collect = .true.
> /
> &system
>   ibrav = 0,
>   nat = 72,
>   ntyp = 2,
>   nbnd = 896,
>   ecutwfc = 50,
>   !ecutrho = 280,
>   !tot_charge=+1.0,
>   occupations = 'smearing',
>   smearing = 'mv',
>   degauss = 0.0073,
>   lspinorb = .true.,
>   noncolin = .true.,
>   starting_magnetization(1) = 0.0,
>   starting_magnetization(2) = 0.0
> /
> &electrons
>   conv_thr = 1.0D-7
> /
> &ions
> /
> ATOMIC_SPECIES
>   Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
>   Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF
> CELL_PARAMETERS angstrom
>   16.344 0 0
>   -8.172 14.1543 0
>   1.77359e-15 3.07196e-15 28.965
> K_POINTS automatic
>   2 2 1 0 0 0
> ATOMIC_POSITIONS angstrom
> Bi    0    3.146    2.414    0    0    0
> Na    2.724    4.718    2.414    0    0    0
> Na    0    3.146    5.629
> Bi    2.724    1.573    7.241
> Na    2.724    4.718    7.241
> Na    2.724    1.573    0.801    0    0    0
> Na    2.724    1.573    4.026
> Na    0    3.146    8.854
> Bi    -2.724    7.864    2.414    0    0    0
> Na    0    9.436    2.414    0    0    0
> Na    -2.724    7.864    5.629
> Bi    0    6.291    7.241
> Na    0    9.436    7.241
> Na    0    6.291    0.801    0    0    0
> Na    0    6.291    4.026
> Na    -2.724    7.864    8.854
> Bi    -5.448    12.582    2.414    0    0    0
> Na    -2.724    14.154    2.414    0    0    0
> Na    -5.448    12.582    5.629
> Bi    -2.724    11.009    7.241
> Na    -2.724    14.154    7.241
> Na    -2.724    11.009    0.801    0    0    0
> Na    -2.724    11.009    4.026
> Na    -5.448    12.582    8.854
> Bi    5.448    3.146    2.414    0    0    0
> Na    8.172    4.718    2.414    0    0    0
> Na    5.448    3.146    5.629
> Bi    8.172    1.573    7.241
> Na    8.172    4.718    7.241
> Na    8.172    1.573    0.801    0    0    0
> Na    8.172    1.573    4.026
> Na    5.448    3.146    8.854
> Bi    2.724    7.864    2.414    0    0    0
> Na    5.448    9.436    2.414    0    0    0
> Na    2.724    7.864    5.629
> Bi    5.448    6.291    7.241
> Na    5.448    9.436    7.241
> Na    5.448    6.291    0.801    0    0    0
> Na    5.448    6.291    4.026
> Na    2.724    7.864    8.854
> Bi    0    12.582    2.414    0    0    0
> Na    2.724    14.154    2.414    0    0    0
> Na    0    12.582    5.629
> Bi    2.724    11.009    7.241
> Na    2.724    14.154    7.241
> Na    2.724    11.009    0.801    0    0    0
> Na    2.724    11.009    4.026
> Na    0    12.582    8.854
> Bi    10.896    3.146    2.414    0    0    0
> Na    13.62    4.718    2.414    0    0    0
> Na    10.896    3.146    5.629
> Bi    13.62    1.573    7.241
> Na    13.62    4.718    7.241
> Na    13.62    1.573    0.801    0    0    0
> Na    13.62    1.573    4.026
> Na    10.896    3.146    8.854
> Bi    8.172    7.864    2.414    0    0    0
> Na    10.896    9.436    2.414    0    0    0
> Na    8.172    7.864    5.629
> Bi    10.896    6.291    7.241
> Na    10.896    9.436    7.241
> Na    10.896    6.291    0.801    0    0    0
> Na    10.896    6.291    4.026
> Na    8.172    7.864    8.854
> Bi    5.448    12.582    2.414    0    0    0
> Na    8.172    14.154    2.414    0    0    0
> Na    5.448    12.582    5.629
> Bi    8.172    11.009    7.241
> Na    8.172    14.154    7.241
> Na    8.172    11.009    0.801    0    0    0
> Na    8.172    11.009    4.026
> Na    5.448    12.582    8.854
> 
> 
> 
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum@pwscf.org <mailto:Pw_forum@pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum 
> <http://pwscf.org/mailman/listinfo/pw_forum>
> 
> 
> 
> -- 
> Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax +39-0432-558222 
> <tel:%2B39-0432-558222>_______________________________________________
> Pw_forum mailing list
> Pw_forum@pwscf.org <mailto:Pw_forum@pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum 
> <http://pwscf.org/mailman/listinfo/pw_forum>
_______________________________________________
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to