Re: [QE-users] Optimal pw command line for large systems and only Gamma point

Giuseppe Mattioli Mon, 13 May 2024 08:27:51 -0700


Dear Antonio

The actual time spent per scf cycle is about 33 minutes.


This is not so bad. :-)

The relevant parameters in the input file are the following:


Some relevant parameters are not shown.

    input_dft= 'pz'
    ecutwfc= 25


Which kind of pseudopotential? You didn't set ecutrho...
What about ibrav and celldm?
I suppose that you really want to perform LDA calculations for some reason.

    occupations= 'smearing'
    smearing= 'cold'
degauss= 0.05 ! I know it's quite large, but necessary tostabilize the SCF at this preliminary stage (no geometry step doneyet)
    mixing_beta= 0.4

If you want to stabilize the scf it is better to use a Gaussiansmearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 oreven 0.05~0.01). In the case of a relax calculation with a difficultfirst step, try to use scf_must_converge=.false. and a reasonableelectron_maxstep (30~50). It often helps when the scf is notcompletely going astray.

    nbnd= 2010

    diagonalization= 'ppcg'


davidson should be faster.

And, if possible, also to reduce the number of nodes?

     Estimated total dynamical RAM >    1441.34 GB


you may try with 7-8 nodes according to this estimate.

HTH
Giuseppe

Quoting Antonio Cammarata via users <users@lists.quantum-espresso.org>:

I did some tests. For 1000 Si atoms, I use 2010 bands because I needto get the band gap value; moreover, being a cluster, the surfacestates of the truncated bonds might close the gap, especially at thefirst steps of the geometry optimization, so it's better I use fewempty bands. I managed to run the calculation by using 10 nodes anda max of 40 cores per node. My question now is: can you suggest meoptimal command line options and/or input settings to speed up thecalculation? And, if possible, also to reduce the number of nodes?The relevant parameters in the input file are the following:


    input_dft= 'pz'
    ecutwfc= 25
    occupations= 'smearing'
    smearing= 'cold'

degauss= 0.05 ! I know it's quite large, but necessary tostabilize the SCF at this preliminary stage (no geometry step doneyet)

    nbnd= 2010

    diagonalization= 'ppcg'
    mixing_mode= 'plain'
    mixing_beta= 0.4

The actual time spent per scf cycle is about 33 minutes. I use QE v.7.3 compiled with openmpi and scalapack. I have access to the intelcompilers too but I did some tests and the difference is just tensof seconds. I have only the Gamma point; please, here you have someinfo about the grid and the estimated RAM usage:


     Dense  grid: 24616397 G-vectors     FFT dimensions: ( 375, 375, 375)
     Dynamical RAM for                 wfc:     235.91 MB
     Dynamical RAM for     wfc (w. buffer):     235.91 MB
     Dynamical RAM for           str. fact:       0.94 MB
     Dynamical RAM for           local pot:       0.00 MB
     Dynamical RAM for          nlocal pot:    2112.67 MB
     Dynamical RAM for                qrad:       0.80 MB
     Dynamical RAM for          rho,v,vnew:       6.04 MB
     Dynamical RAM for               rhoin:       2.01 MB
     Dynamical RAM for            rho*nmix:      15.03 MB
     Dynamical RAM for           G-vectors:       3.99 MB
     Dynamical RAM for          h,s,v(r/c):       0.46 MB
     Dynamical RAM for          <psi|beta>:     552.06 MB
     Dynamical RAM for      wfcinit/wfcrot:    1305.21 MB
     Estimated static dynamical RAM per process >       2.31 GB
     Estimated max dynamical RAM per process >       3.60 GB
     Estimated total dynamical RAM >    1441.34 GB

Thanks a lot in advance for your kind help.

All the best

Antonio


On 10. 05. 24 12:01, Paolo Giannozzi wrote:

On 5/10/24 08:58, Antonio Cammarata via users wrote:

pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out

too many processors for linear-algebra parallelization. 1000 Siatoms = 2000 bands (assuming an insulator with no spinpolarization). Use a few tens of processors at most

"some processors have no G-vectors for symmetrization".

which sounds strange to me: with the Gamma point symmetrization isnot even needed

      Dense  grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400)


This is what a 256-atom Si supercell with 30 Ry cutoff yields:

     Dense  grid:   825897 G-vectors     FFT dimensions: ( 162, 162, 162)

I guess you may reduce the size of your supercell

Paolo

      Dynamical RAM for wfc:     153.50 MB
      Dynamical RAM for     wfc (w. buffer):     153.50 MB
      Dynamical RAM for           str. fact:       0.61 MB
      Dynamical RAM for           local pot:       0.00 MB
      Dynamical RAM for          nlocal pot:    1374.66 MB
      Dynamical RAM for                qrad:       0.87 MB
      Dynamical RAM for          rho,v,vnew:       5.50 MB
      Dynamical RAM for               rhoin:       1.83 MB
      Dynamical RAM for            rho*nmix:       9.78 MB
      Dynamical RAM for           G-vectors:       2.60 MB
      Dynamical RAM for          h,s,v(r/c):       0.25 MB
      Dynamical RAM for          <psi|beta>:     552.06 MB
      Dynamical RAM for      wfcinit/wfcrot:     977.20 MB
      Estimated static dynamical RAM per process >       1.51 GB
      Estimated max dynamical RAM per process >       2.47 GB
      Estimated total dynamical RAM >    1900.41 GB

I managed to run the simulation with 512 atoms, cg diagonalizationand 3 nodes on the same machine with command line


pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out

Please, do you have any suggestion on how to set optimalparallelization parameters to avoid the memory issue and run thecalculation? I am also planning to run simulations on nanoclusterswith more than 1000 atoms.


Thanks a lot in advance for your kind help.

Antonio

--
_______________________________________________
Antonio Cammarata, PhD in Physics
Associate Professor in Applied Physics
Advanced Materials Group
Department of Control Engineering - KN:G-204
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo Náměstí, 13
121 35, Prague 2, Czech Republic
Phone: +420 224 35 5711
Fax:   +420 224 91 8646
ORCID: orcid.org/0000-0002-5691-0682
WoS ResearcherID: A-4883-2014

_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users




GIUSEPPE MATTIOLI
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
Via Salaria Km 29,300 - C.P. 10
I-00015 - Monterotondo Scalo (RM)
Mob (*preferred*) +39 373 7305625
Tel + 39 06 90672342 - Fax +39 06 90672316
E-mail: <giuseppe.matti...@ism.cnr.it>

_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Optimal pw command line for large systems and only Gamma point

Reply via email to