Dear Julien,

I can't give any valuable input for your question regarding the parallelization, but I think your input is wrong. Using assume_isolated needs the system to be centered around z=0.



I am starting to use a hpc cluster of my university, but I am very green on parallel computation.

I have made a first test (test #1) on a very small-scale simulation (relaxation of a GO sheet with 19 atoms, with respect to the gamma point). The calculation took 3m20s to run on 1 proc on my personal computer. On the cluster with 4 proc and default parallel options, it took 1m5s, and on 8 proc it took 44s. This seems like a reasonable behavior, and at least shows that raising the number of procs does reduce computation time in this case (with obvious limitations if too many procs for the job).

However I tried with another test, a bit bigger (test #2). This example is a scf calculation with 120 atoms (still with respect to the gamma point). In this case, the parallelization brings absolutely no improvement. In fact, although the /outfile/ confirms that the code is running on N procs, it has similar performances as if it was running on 1 proc (sometimes even worse actually, but probably not in a significant manner, as the times are fluctuating a bit from 1 run to another)

I tried to run this same input file on my personal computer both on 1 and 2 cores. Turns out that it takes 10376s to run 10 iterations on 1 core, while it takes 6777s on two cores, so it seems that the parallelization is doing ok on my computer.

I have tried to run with different number of cores on the hpc, and different parallelization options (like for instance –nb 4), but nothing seems to improve the time

Basically, I am stuck with those 2 seemingly conflicting facts:

  * Parallelization seems to have no particular problem on the hpc
    cluster because test #1 gives good results
  * Parallelization seems to have no particular problem with the
    particular input file #2 because it seems to scale reasonably with
    proc number on my individual computer

However, combining both and running this file in parallel on the hpc cluster ends up not working correctly…

I included below the input file and output file of test #2. I also included as well as the slurm script that I use to submit the calculation to the job manager, in case it helps (test2.scf.slurm.txt)

Any suggestion on what is going wrong would be very welcome.




  title = '# Quantum Espresso PWSCF output snapshot # 0'
  pseudo_dir = '/lustre/home/acct-mseyxd/mseyxd/QE/qe-6.3/pseudo/' ,
  calculation = 'scf'

  nat= 120
  ntyp= 7
  ibrav= 0
  ecutwfc= 50, ecutrho=400,
  occupations='smearing', smearing='mv', degauss=1.0d-3

  mixing_beta = 0.5
  conv_thr =  1.0d-7



C   12.011  C.pbesol-n-kjpaw_psl.1.0.0.UPF
N   14.007  N.pbesol-n-kjpaw_psl.0.1.UPF
H    1.008  H.pbesol-kjpaw_psl.0.1.UPF
Pb  207.2   Pb.pbesol-dn-kjpaw_psl.1.0.0.UPF
I   126.9   I.pbesol-n-kjpaw_psl.1.0.0.UPF
O   15.999  O.pbesol-n-kjpaw_psl.1.0.0.UPF
Cl  35.450  Cl.pbesol-n-kjpaw_psl.1.0.0.UPF

      6.40743642        0.00000000        0.00000000
      0.00000000       12.53119000        0.00000000
      0.00000000        0.00000000       39.01263233

K_POINTS gamma



     Program PWSCF v.6.3 starts on 10Apr2019 at 15:35:34

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
         "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
     in publications or presentations arising from this work. More details at

     Parallel version (MPI), running on     8 processors

     MPI processes distributed on     1 nodes
     R & G space division:  proc/nbgrp/npool/nimage = 8
     Reading input from /lustre/home/acct-mseyxd/mseyxd/QE/GO-Cl/FAPBI3_bonding/scf/1x2x3_matching/
Warning: card &IONS ignored
Warning: card / ignored
Warning: card &CELL ignored
Warning: card / ignored

     Current dimensions of program PWSCF are:
     Max number of different atomic species (ntypx) = 10
     Max number of k-points (npk) =  40000
     Max angular momentum in pseudopotentials (lmaxx) =  3
               file C.pbesol-n-kjpaw_psl.1.0.0.UPF: wavefunction(s)  2S 2P renormalized                file N.pbesol-n-kjpaw_psl.0.1.UPF: wavefunction(s)  2P renormalized                file H.pbesol-kjpaw_psl.0.1.UPF: wavefunction(s) 1S renormalized                file Pb.pbesol-dn-kjpaw_psl.1.0.0.UPF: wavefunction(s)  6S 6P 5D renormalized                file I.pbesol-n-kjpaw_psl.1.0.0.UPF: wavefunction(s)  5S renormalized                file O.pbesol-n-kjpaw_psl.1.0.0.UPF: wavefunction(s)  2S 2P renormalized                file Cl.pbesol-n-kjpaw_psl.1.0.0.UPF: wavefunction(s)  3S 3P renormalized

     gamma-point specific algorithms are used

     Subspace diagonalization in iterative solution of the eigenvalue problem:
     a serial algorithm will be used

     Parallelization info
     sticks:   dense  smooth     PW     G-vecs:    dense smooth      PW
     Min        1140     570    141               356988 126222   15758
     Max        1142     572    142               357012 126236   15798
     Sum        9123    4565   1135              2856023 1009807  126259

     # Quantum Espresso PWSCF output snapshot # 0

     bravais-lattice index     =            0
     lattice parameter (alat)  =      12.1083  a.u.
     unit-cell volume          =   21138.7101 (a.u.)^3
     number of atoms/cell      =          120
     number of atomic types    =            7
     number of electrons       =       542.00
     number of Kohn-Sham states=          325
     kinetic-energy cutoff     =      50.0000  Ry
     charge density cutoff     =     400.0000  Ry
     convergence threshold     =      1.0E-07
     mixing beta               =       0.5000
     number of iterations used =            8  plain     mixing
     Exchange-correlation      = SLA PW PSX PSC ( 1  4 10  8 0 0)

     celldm(1)=  12.108300  celldm(2)=   0.000000  celldm(3)= 0.000000
     celldm(4)=   0.000000  celldm(5)=   0.000000  celldm(6)= 0.000000

     crystal axes: (cart. coord. in units of alat)
               a(1) = (   1.000000   0.000000   0.000000 )
               a(2) = (   0.000000   1.955726   0.000000 )
               a(3) = (   0.000000   0.000000   6.088649 )

     reciprocal axes: (cart. coord. in units 2 pi/alat)
               b(1) = (  1.000000  0.000000  0.000000 )
               b(2) = (  0.000000  0.511319  0.000000 )
               b(3) = (  0.000000  0.000000  0.164240 )

     PseudoPot. # 1 for C  read from file:
     MD5 check sum: f9b2fe17d1f478429498b05d17159f9e
     Pseudo is Projector augmented-wave + core cor, Zval =  4.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1073 points,  4 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 2 for N  read from file:
     MD5 check sum: 15bd223d5d75e9eda893d0f4e6bdad1b
     Pseudo is Projector augmented-wave + core cor, Zval =  5.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1085 points,  4 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 3 for H  read from file:
     MD5 check sum: 27a6b98f1514c59d399e798f1258b8b7
     Pseudo is Projector augmented-wave, Zval =  1.0
     Generated using "atomic" code by A. Dal Corso v.5.0.2 svn rev. 9415
     Shape of augmentation charge: PSQ
     Using radial grid of  929 points,  2 beta functions with:
                l(1) =   0
                l(2) =   0
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 4 for Pb read from file:
     MD5 check sum: 56da3be0db09ba43f309b470f7bff7d1
     Pseudo is Projector augmented-wave + core cor, Zval = 14.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1281 points,  6 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
                l(5) =   2
                l(6) =   2
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 5 for I  read from file:
     MD5 check sum: 6038403ff9b03366b27f71806436e734
     Pseudo is Projector augmented-wave + core cor, Zval =  7.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1247 points,  6 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
                l(5) =   2
                l(6) =   2
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 6 for O  read from file:
     MD5 check sum: cb766521a97cf798d01896eaf7ac9a0a
     Pseudo is Projector augmented-wave + core cor, Zval =  6.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1095 points,  4 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
     Q(r) pseudized with 0 coefficients

     PseudoPot. # 7 for Cl read from file:
     MD5 check sum: 939a64fc035742408689cdf8470f8314
     Pseudo is Projector augmented-wave + core cor, Zval =  7.0
     Generated using "atomic" code by A. Dal Corso v.6.3
     Shape of augmentation charge: PSQ
     Using radial grid of 1157 points,  6 beta functions with:
                l(1) =   0
                l(2) =   0
                l(3) =   1
                l(4) =   1
                l(5) =   2
                l(6) =   2
     Q(r) pseudized with 0 coefficients

     atomic species   valence    mass     pseudopotential
        C              4.00    12.01100     C ( 1.00)
        N              5.00    14.00700     N ( 1.00)
        H              1.00     1.00800     H ( 1.00)
        Pb            14.00   207.20000     Pb( 1.00)
        I              7.00   126.90000     I ( 1.00)
        O              6.00    15.99900     O ( 1.00)
        Cl             7.00    35.45000     Cl( 1.00)

     No symmetry found

   Cartesian axes

     site n.     atom                  positions (alat units)
         site n.     atom                  positions (alat units)
         1           C   tau(   1) = (   0.5000029   0.5092449 3.5388726  )
[... 120 atomic positions total ...]         14           N   tau(  14) = (   0.6817550   1.3942919 3.5388726  )         15           N   tau(  15) = (   0.3182468   1.3942919 3.5388726  )         16           H   tau(  16) = (   0.5000020   1.6582770 3.5388726  )         17           H   tau(  17) = (   0.8118699   1.4870024 3.5388726  )         18           H   tau(  18) = (   0.7019875   1.2355616 3.5388726  )         19           H   tau(  19) = (   0.2980163   1.2355616 3.5388726  )         20           H   tau(  20) = (   0.1881339   1.4870024 3.5388726  )         21           Pb  tau(  21) = (   1.0000038   1.9217796 3.0443246  )         22           I   tau(  22) = (   0.5000020   1.9401372 3.0443246  )         23           I   tau(  23) = (   1.0000038   1.4303819 3.0443246  )         24           I   tau(  24) = (   0.0000002   1.8772407 3.5388726  )         25           C   tau(  25) = (   0.5000029   0.5092449 4.5279646  )         26           N   tau(  26) = (   0.6817550   0.4164289 4.5279646  )         27           N   tau(  27) = (   0.3182468   0.4164289 4.5279646  )         28           H   tau(  28) = (   0.5000020   0.6804140 4.5279646  )         29           H   tau(  29) = (   0.8118699   0.5091394 4.5279646  )         30           H   tau(  30) = (   0.7019875   0.2576986 4.5279646  )         31           H   tau(  31) = (   0.2980163   0.2576986 4.5279646  )         32           H   tau(  32) = (   0.1881339   0.5091394 4.5279646  )         33           Pb  tau(  33) = (   1.0000038   0.9439166 4.0334166  )         34           I   tau(  34) = (   0.5000020   0.9622742 4.0334166  )         35           I   tau(  35) = (   1.0000038   0.4525189 4.0334166  )         36           I   tau(  36) = (   0.0000002   0.8993777 4.5279646  )         37           C   tau(  37) = (   0.5000029   1.4871079 4.5279646  )         38           N   tau(  38) = (   0.6817550   1.3942919 4.5279646  )         39           N   tau(  39) = (   0.3182468   1.3942919 4.5279646  )         40           H   tau(  40) = (   0.5000020   1.6582770 4.5279646  )         41           H   tau(  41) = (   0.8118699   1.4870024 4.5279646  )         42           H   tau(  42) = (   0.7019875   1.2355616 4.5279646  )         43           H   tau(  43) = (   0.2980163   1.2355616 4.5279646  )         44           H   tau(  44) = (   0.1881339   1.4870024 4.5279646  )         45           Pb  tau(  45) = (   1.0000038   1.9217796 4.0334166  )         46           I   tau(  46) = (   0.5000020   1.9401372 4.0334166  )         47           I   tau(  47) = (   1.0000038   1.4303819 4.0334166  )         48           I   tau(  48) = (   0.0000002   1.8772407 4.5279646  )         49           C   tau(  49) = (   0.5000029   0.5092449 5.5170566  )         50           N   tau(  50) = (   0.6817550   0.4164289 5.5170566  )         51           N   tau(  51) = (   0.3182468   0.4164289 5.5170566  )         52           H   tau(  52) = (   0.5000020   0.6804140 5.5170566  )         53           H   tau(  53) = (   0.8118699   0.5091394 5.5170566  )         54           H   tau(  54) = (   0.7019875   0.2576986 5.5170566  )         55           H   tau(  55) = (   0.2980163   0.2576986 5.5170566  )         56           H   tau(  56) = (   0.1881339   0.5091394 5.5170566  )         57           Pb  tau(  57) = (   1.0000038   0.9439166 5.0225086  )         58           I   tau(  58) = (   0.5000020   0.9622742 5.0225086  )         59           I   tau(  59) = (   1.0000038   0.4525189 5.0225086  )         60           I   tau(  60) = (   0.0000002   0.8993777 5.5170566  )         61           C   tau(  61) = (   0.5000029   1.4871079 5.5170566  )         62           N   tau(  62) = (   0.6817550   1.3942919 5.5170566  )         63           N   tau(  63) = (   0.3182468   1.3942919 5.5170566  )         64           H   tau(  64) = (   0.5000020   1.6582770 5.5170566  )         65           H   tau(  65) = (   0.8118699   1.4870024 5.5170566  )         66           H   tau(  66) = (   0.7019875   1.2355616 5.5170566  )         67           H   tau(  67) = (   0.2980163   1.2355616 5.5170566  )         68           H   tau(  68) = (   0.1881339   1.4870024 5.5170566  )         69           Pb  tau(  69) = (   1.0000038   1.9217796 5.0225086  )         70           I   tau(  70) = (   0.5000020   1.9401372 5.0225086  )         71           I   tau(  71) = (   1.0000038   1.4303819 5.0225086  )         72           I   tau(  72) = (   0.0000002   1.8772407 5.5170566  )         73           C   tau(  73) = (  -0.4150218   0.1603553 2.0527208  )         74           C   tau(  74) = (  -0.2451558   0.4319819 2.2087050  )         75           C   tau(  75) = (  -0.2422954   0.2237748 2.1733265  )         76           C   tau(  76) = (  -0.4318084   0.5359346 2.1551211  )         77           C   tau(  77) = (  -0.0804884   0.0920926 2.2271668  )         78           C   tau(  78) = (   0.0704299   0.4023002 2.3979625  )         79           C   tau(  79) = (   0.0843280   0.1777180 2.3246433  )         80           C   tau(  80) = (  -0.0965417   0.5401709 2.3216025  )         81           C   tau(  81) = (   0.2744480   0.0706378 2.2513170  )         82           C   tau(  82) = (   0.3931012   0.4257914 2.2255358  )         83           C   tau(  83) = (   0.3972373   0.2291930 2.1552281  )         84           C   tau(  84) = (   0.2639893   0.5338504 2.3796795  )         85           C   tau(  85) = (  -0.4439138   0.7386909 2.1459684  )         86           C   tau(  86) = (  -0.2797555   1.0615097 2.1569667  )         87           C   tau(  87) = (  -0.2677453   0.8523788 2.1882228  )         88           C   tau(  88) = (  -0.4363551   1.2314011 2.1710336  )         89           C   tau(  89) = (  -0.1048337   0.7570551 2.2983573  )         90           C   tau(  90) = (   0.0660168   1.1069478 2.2676475  )         91           C   tau(  91) = (   0.0490337   0.8974047 2.3909856  )         92           C   tau(  92) = (  -0.0855608   1.1516729 2.1257576  )         93           C   tau(  93) = (   0.2473718   0.7708248 2.3647407  )         94           C   tau(  94) = (   0.3053839   0.9967849 2.0243396  )         95           C   tau(  95) = (   0.3742542   0.8566514 2.1988956  )         96           C   tau(  96) = (   0.3418069   1.2210682 2.0863099  )         97           C   tau(  97) = (  -0.4041108   1.5188867 1.7359880  )         98           C   tau(  98) = (  -0.2399343   1.7936370 1.9719763  )         99           C   tau(  99) = (  -0.2247255   1.6006251 1.8503348  )        100           C   tau( 100) = (  -0.3842603   1.9640672 1.9155257  )        101           C   tau( 101) = (  -0.0936461   1.4591817 1.9376295  )        102           C   tau( 102) = (   0.0960162   1.7026737 2.1361737  )        103           C   tau( 103) = (   0.0884943   1.5176961 2.0381967  )        104           C   tau( 104) = (  -0.0894460   1.8328508 2.1492063  )        105           C   tau( 105) = (   0.2712143   1.3993061 1.9429809  )        106           C   tau( 106) = (   0.3808697   1.7588934 1.9404992  )        107           C   tau( 107) = (   0.4154868   1.5615071 1.8748814  )        108           C   tau( 108) = (   0.2862768   1.8203568 2.1475771  )        109           Cl  tau( 109) = (  -0.0000028   0.9438992 2.6646597  )        110           Cl  tau( 110) = (   0.1953439   1.3113248 1.6804758  )        111           O   tau( 111) = (  -0.2795590   1.7373513 2.1919488  )        112           O   tau( 112) = (   0.4484580   1.9035780 1.7956676  )        113           O   tau( 113) = (   0.4160721   0.9016165 2.4211389  )        114           O   tau( 114) = (   0.2625021   0.9147286 1.8595108  )        115           O   tau( 115) = (   0.3809809   1.8566143 2.3515614  )        116           O   tau( 116) = (   0.6071370   1.3829932 2.2760915  )        117           O   tau( 117) = (  -0.3880864   1.3984041 1.5899412  )        118           O   tau( 118) = (  -0.1162457   1.2479688 1.9337466  )        119           O   tau( 119) = (   0.2357952   1.2312528 2.2884430  )        120           O   tau( 120) = (   0.2012385   0.4461897 2.5730642  )

     number of k points=     1  Marzari-Vanderbilt smearing, width (Ry)=  0.0010
                       cart. coord. in units 2pi/alat
        k(    1) = (   0.0000000   0.0000000   0.0000000), wk =   2.0000000

     Dense  grid:  1428012 G-vectors     FFT dimensions: (  80, 160, 480)

     Smooth grid:   504904 G-vectors     FFT dimensions: (  60, 108, 360)

     Estimated max dynamical RAM per process >     965.66 MB

     Estimated total dynamical RAM >       7.54 GB
  The code is running with the 2D cutoff
  Please refer to:
  Sohier, T., Calandra, M., & Mauri, F. (2017),
  Density functional perturbation theory for gated two-dimensional heterostructures:   Theoretical developments and application to flexural phonons in graphene.   Physical Review B, 96(7), 75448.

     Check: negative/imaginary core charge=   -0.000002 0.000000

     Initial potential from superposition of free atoms
     Check: negative starting charge=   -0.001132

     starting charge  541.98383, renormalised to  542.00000

     negative rho (up, down):  1.132E-03 0.000E+00
     Starting wfcs are  420 randomized atomic wfcs
     Checking if some PAW data can be deallocated...

     total cpu time spent up to now is      125.6 secs

     Self-consistent Calculation

     iteration #  1     ecut=    50.00 Ry     beta= 0.50
     Davidson diagonalization with overlap
     c_bands:  3 eigenvalues not converged
     ethr =  1.00E-02,  avg # of iterations = 40.0

     negative rho (up, down):  1.031E-05 0.000E+00

     total cpu time spent up to now is     2094.5 secs

     total energy              =   82142.85683667 Ry
     Harris-Foulkes estimate   =  -53335.51769720 Ry
     estimated scf accuracy    <  111068.31785845 Ry

     End of self-consistent calculation

     convergence NOT achieved after   1 iterations: stopping

     Writing output data file

     init_run     :    119.18s CPU    120.33s WALL (       1 calls)
     electrons    :   1961.71s CPU   1969.12s WALL (       1 calls)

     Called by init_run:
     wfcinit      :     52.26s CPU     52.44s WALL (       1 calls)
     potinit      :     19.26s CPU     19.33s WALL (       1 calls)
     hinit0       :     36.63s CPU     36.68s WALL (       1 calls)

     Called by electrons:
     c_bands      :   1919.78s CPU   1923.97s WALL (       1 calls)
     sum_band     :     28.22s CPU     30.08s WALL (       1 calls)
     v_of_rho     :      2.26s CPU      2.35s WALL (       2 calls)
     newd         :     20.58s CPU     22.50s WALL (       2 calls)
     PAW_pot      :      4.00s CPU      4.00s WALL (       2 calls)
     mix_rho      :      0.23s CPU      0.24s WALL (       1 calls)

     Called by c_bands:
     init_us_2    :      0.22s CPU      0.27s WALL (       3 calls)
     regterg      :   1919.41s CPU   1923.60s WALL (       2 calls)

     Called by sum_band:
     sum_band:bec :      0.00s CPU      0.00s WALL (       1 calls)
     addusdens    :     16.57s CPU     17.94s WALL (       1 calls)

     Called by *egterg:
     h_psi        :    680.38s CPU    682.69s WALL (      43 calls)
     s_psi        :    259.57s CPU    259.75s WALL (      43 calls)
     g_psi        :      0.93s CPU      0.94s WALL (      40 calls)
     rdiaghg      :     52.76s CPU     52.86s WALL (      41 calls)

     Called by h_psi:
     h_psi:pot    :    679.62s CPU    681.90s WALL (      43 calls)
     h_psi:calbec :    255.27s CPU    255.54s WALL (      43 calls)
     vloc_psi     :    164.42s CPU    166.01s WALL (      43 calls)
     add_vuspsi   :    259.93s CPU    260.35s WALL (      43 calls)

     General routines
     calbec       :    263.20s CPU    263.88s WALL (      44 calls)
     fft          :      2.33s CPU      2.43s WALL (      23 calls)
     ffts         :      0.09s CPU      0.09s WALL (       3 calls)
     fftw         :    128.50s CPU    130.07s WALL (   10237 calls)
     interpolate  :      0.25s CPU      0.26s WALL (       2 calls)
     davcio       :      0.00s CPU      0.10s WALL (       3 calls)

     Parallel routines
     fft_scatt_xy :     23.50s CPU     23.55s WALL (   10263 calls)
     fft_scatt_yz :     10.98s CPU     12.22s WALL (   10263 calls)

     PWSCF        : 34m45.53s CPU    34m55.12s WALL

   This run was terminated on:  16:10:30  10Apr2019


*-----------------------------------------------------SLURM command-------------------------------------*



#SBATCH --job-name=QE_GO-Cl_bonding_scf
#SBATCH --partition=cpu
#SBATCH --mail-type=end
#SBATCH --output=bonding.scf.slurm.out
#SBATCH --error=bonding.scf.slurm.err
#SBATCH -p cpu
#SBATCH -n 8
#SBATCH --ntasks-per-node=8

ulimit -l unlimited
ulimit -s unlimited


srun --mpi=pmi2 $EXEC -in $INPUT

