Dear Wannier90 users, I am trying to wannierize the electronic structure of ZnMgHf. The structure is large, contains 174 atoms and I calculate 1400 spins unpolarized bands. I carry out calculations using computational cluster. The problem is when I try to use pw2wannier90.x. I receive the message that my calculations are terminated and in joberr file it is stated that the system is out of memory. I even tried 64 nodes with 180 Gb of RAM each and still receive this message. Is there a proper way to make calculations on a cluster with pw2wannier90.x? I run the file with the instruction: mpirun -np 3072 pw2wannier90.x -in pw2wan_ZnMgHf_m1.in > pw2wan_ZnMgHf_m1.out, where pw2wan_ZnMgHf_m1.in is the input file Details of slurm comends are in run_qe_wannier.sh file. I had no problems with scf, nscf calculations and never needed that high number of nodes. I usually use just 4 nodes for this system and the memeory amount is enough.
I include the .win and pw2wannier input files and run file for cluster. Also, output files are included. I used Quantum espresso 6.7 and wannier90 3.1.0 Would appreciate any help. Best regards, Ireneusz Buganski AGH University of Science and Technology, Krakow, Poland
run_qe_wannier.sh
Description: Unix shell archive
pw2wan_ZnMgHf_m1.out
Description: Binary data
pw2wan_ZnMgHf_m1.in
Description: Binary data
gcccore/10.3.0 loaded. zlib/1.2.11-gcccore-10.3.0 loaded. binutils/2.36.1-gcccore-10.3.0 loaded. intel-compilers/2021.2.0 loaded. numactl/2.0.14-gcccore-10.3.0 loaded. ucx/1.10.0-gcccore-10.3.0 loaded. impi/2021.2.0-intel-compilers-2021.2.0 loaded. iimpi/2021a loaded. imkl/2021.2.0-iimpi-2021a loaded. intel/2021a loaded. szip/2.1.1-gcccore-10.3.0 loaded. hdf5/1.10.7-iimpi-2021a loaded. elpa/2021.05.001-intel-2021a loaded. libxc/5.1.5-intel-compilers-2021.2.0 loaded. quantumespresso/6.7-intel-2021a loaded. intel/2021a unloaded. gcccore/10.3.0 unloaded. gcccore/11.2.0 loaded. zlib/1.2.11-gcccore-10.3.0 unloaded. binutils/2.36.1-gcccore-10.3.0 unloaded. zlib/1.2.11-gcccore-11.2.0 loaded. binutils/2.37-gcccore-11.2.0 loaded. intel-compilers/2021.2.0 unloaded. intel-compilers/2021.4.0 loaded. impi/2021.2.0-intel-compilers-2021.2.0 unloaded. ucx/1.10.0-gcccore-10.3.0 unloaded. numactl/2.0.14-gcccore-10.3.0 unloaded. numactl/2.0.14-gcccore-11.2.0 loaded. ucx/1.11.2-gcccore-11.2.0 loaded. impi/2021.4.0-intel-compilers-2021.4.0 loaded. imkl/2021.2.0-iimpi-2021a unloaded. imkl/2021.4.0 loaded. iimpi/2021a unloaded. iimpi/2021b loaded. imkl-fftw/2021.4.0-iimpi-2021b loaded. intel/2021b loaded. wannier90/3.1.0-intel-2021b loaded. The following have been reloaded with a version change: 1) binutils/2.36.1-gcccore-10.3.0 => binutils/2.37-gcccore-11.2.0 2) gcccore/10.3.0 => gcccore/11.2.0 3) iimpi/2021a => iimpi/2021b 4) imkl/2021.2.0-iimpi-2021a => imkl/2021.4.0 5) impi/2021.2.0-intel-compilers-2021.2.0 => impi/2021.4.0-intel-compilers-2021.4.0 6) intel-compilers/2021.2.0 => intel-compilers/2021.4.0 7) intel/2021a => intel/2021b 8) numactl/2.0.14-gcccore-10.3.0 => numactl/2.0.14-gcccore-11.2.0 9) ucx/1.10.0-gcccore-10.3.0 => ucx/1.11.2-gcccore-11.2.0 10) zlib/1.2.11-gcccore-10.3.0 => zlib/1.2.11-gcccore-11.2.0 slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.9. Some of the step tasks have been OOM Killed. srun: error: ac0519: task 0: Out Of Memory slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.6. Some of the step tasks have been OOM Killed. srun: error: ac0074: task 1: Out Of Memory [proxy:0:40@ac0517] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed srun: error: ac0517: task 10: Exited with exit code 5 [proxy:0:4@ac0043] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed srun: error: ac0043: task 1: Exited with exit code 5 srun: error: ac0662: task 14: Broken pipe [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0705: task 15: Broken pipe [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0620: task 0: Out Of Memory srun: error: ac0599: task 12: Exited with exit code 5 slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.5. Some of the step tasks have been OOM Killed. [proxy:0:48@ac0599] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0416: task 5: Out Of Memory slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.0. Some of the step tasks have been OOM Killed. [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor)
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
_______________________________________________ Wannier mailing list [email protected] https://lists.quantum-espresso.org/mailman/listinfo/wannier
