Dear Antonio

The actual time spent per scf cycle is about 33 minutes.

This is not so bad. :-)

The relevant parameters in the input file are the following:

Some relevant parameters are not shown.

    input_dft= 'pz'
    ecutwfc= 25

Which kind of pseudopotential? You didn't set ecutrho...
What about ibrav and celldm?
I suppose that you really want to perform LDA calculations for some reason.

    occupations= 'smearing'
    smearing= 'cold'
    degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet)
    mixing_beta= 0.4

If you want to stabilize the scf it is better to use a Gaussian smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or even 0.05~0.01). In the case of a relax calculation with a difficult first step, try to use scf_must_converge=.false. and a reasonable electron_maxstep (30~50). It often helps when the scf is not completely going astray.

    nbnd= 2010

    diagonalization= 'ppcg'

davidson should be faster.

And, if possible, also to reduce the number of nodes?

     Estimated total dynamical RAM >    1441.34 GB

you may try with 7-8 nodes according to this estimate.

HTH
Giuseppe

Quoting Antonio Cammarata via users <users@lists.quantum-espresso.org>:

I did some tests. For 1000 Si atoms, I use 2010 bands because I need to get the band gap value; moreover, being a cluster, the surface states of the truncated bonds might close the gap, especially at the first steps of the geometry optimization, so it's better I use few empty bands. I managed to run the calculation by using 10 nodes and a max of 40 cores per node. My question now is: can you suggest me optimal command line options and/or input settings to speed up the calculation? And, if possible, also to reduce the number of nodes? The relevant parameters in the input file are the following:

    input_dft= 'pz'
    ecutwfc= 25
    occupations= 'smearing'
    smearing= 'cold'
    degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet)
    nbnd= 2010

    diagonalization= 'ppcg'
    mixing_mode= 'plain'
    mixing_beta= 0.4

The actual time spent per scf cycle is about 33 minutes. I use QE v. 7.3 compiled with openmpi and scalapack. I have access to the intel compilers too but I did some tests and the difference is just tens of seconds. I have only the Gamma point; please, here you have some info about the grid and the estimated RAM usage:

     Dense  grid: 24616397 G-vectors     FFT dimensions: ( 375, 375, 375)
     Dynamical RAM for                 wfc:     235.91 MB
     Dynamical RAM for     wfc (w. buffer):     235.91 MB
     Dynamical RAM for           str. fact:       0.94 MB
     Dynamical RAM for           local pot:       0.00 MB
     Dynamical RAM for          nlocal pot:    2112.67 MB
     Dynamical RAM for                qrad:       0.80 MB
     Dynamical RAM for          rho,v,vnew:       6.04 MB
     Dynamical RAM for               rhoin:       2.01 MB
     Dynamical RAM for            rho*nmix:      15.03 MB
     Dynamical RAM for           G-vectors:       3.99 MB
     Dynamical RAM for          h,s,v(r/c):       0.46 MB
     Dynamical RAM for          <psi|beta>:     552.06 MB
     Dynamical RAM for      wfcinit/wfcrot:    1305.21 MB
     Estimated static dynamical RAM per process >       2.31 GB
     Estimated max dynamical RAM per process >       3.60 GB
     Estimated total dynamical RAM >    1441.34 GB

Thanks a lot in advance for your kind help.

All the best

Antonio


On 10. 05. 24 12:01, Paolo Giannozzi wrote:
On 5/10/24 08:58, Antonio Cammarata via users wrote:

pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out

too many processors for linear-algebra parallelization. 1000 Si atoms = 2000 bands (assuming an insulator with no spin polarization). Use a few tens of processors at most

"some processors have no G-vectors for symmetrization".

which sounds strange to me: with the Gamma point symmetrization is not even needed


      Dense  grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400)

This is what a 256-atom Si supercell with 30 Ry cutoff yields:

     Dense  grid:   825897 G-vectors     FFT dimensions: ( 162, 162, 162)

I guess you may reduce the size of your supercell

Paolo

      Dynamical RAM for wfc:     153.50 MB
      Dynamical RAM for     wfc (w. buffer):     153.50 MB
      Dynamical RAM for           str. fact:       0.61 MB
      Dynamical RAM for           local pot:       0.00 MB
      Dynamical RAM for          nlocal pot:    1374.66 MB
      Dynamical RAM for                qrad:       0.87 MB
      Dynamical RAM for          rho,v,vnew:       5.50 MB
      Dynamical RAM for               rhoin:       1.83 MB
      Dynamical RAM for            rho*nmix:       9.78 MB
      Dynamical RAM for           G-vectors:       2.60 MB
      Dynamical RAM for          h,s,v(r/c):       0.25 MB
      Dynamical RAM for          <psi|beta>:     552.06 MB
      Dynamical RAM for      wfcinit/wfcrot:     977.20 MB
      Estimated static dynamical RAM per process >       1.51 GB
      Estimated max dynamical RAM per process >       2.47 GB
      Estimated total dynamical RAM >    1900.41 GB

I managed to run the simulation with 512 atoms, cg diagonalization and 3 nodes on the same machine with command line

pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out

Please, do you have any suggestion on how to set optimal parallelization parameters to avoid the memory issue and run the calculation? I am also planning to run simulations on nanoclusters with more than 1000 atoms.

Thanks a lot in advance for your kind help.

Antonio



--
_______________________________________________
Antonio Cammarata, PhD in Physics
Associate Professor in Applied Physics
Advanced Materials Group
Department of Control Engineering - KN:G-204
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo Náměstí, 13
121 35, Prague 2, Czech Republic
Phone: +420 224 35 5711
Fax:   +420 224 91 8646
ORCID: orcid.org/0000-0002-5691-0682
WoS ResearcherID: A-4883-2014

_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users



GIUSEPPE MATTIOLI
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
Via Salaria Km 29,300 - C.P. 10
I-00015 - Monterotondo Scalo (RM)
Mob (*preferred*) +39 373 7305625
Tel + 39 06 90672342 - Fax +39 06 90672316
E-mail: <giuseppe.matti...@ism.cnr.it>

_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to