Dear WIEN2k users,
I run into the following problem when running WIEN2k in parallel with mpi. WIEN2k Version is 19.1, the patches provided by Gavin Abo are installed. Elpa/FFTW3/Scalapack are used and compiled with gcc/gfortran mpicc/mpif90. The Compilation of WIEN2k shows no errors. K-Point parallelization works fine, WIEN2k is installed on a NFS share on a small selfbuild cluster (right now only 4 nodes but will be more if everything runs). The Problem looks like a problem with openmpi, however simple exemplary mpif90 programs work fine when run in parallel. Something goes wrong with lapw1para. ---------------------------------------------------------------------------------------------------------------------------------- run_lapw -p STOP LAPW0 END [1] Done /usr/lib64/openmpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine0 /home/mpiuser/WIEN2k-19.1/lapw0_mpi lapw0.def >> .time00 [node0:1423512:0:1423512] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [node0:1423513:0:1423513] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== 0 /usr/lib64/libucs.so.0(+0x1b25f) [0x1462b91ad25f] 1 /usr/lib64/libucs.so.0(+0x1b42a) [0x1462b91ad42a] 2 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x4482df] 3 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x40d1c5] 4 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x42dd6e] 5 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404ded] 6 /usr/lib64/libc.so.6(__libc_start_main+0xf3) [0x1462ba7bb1a3] 7 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404e1e] =================== 0 /usr/lib64/libucs.so.0(+0x1b25f) [0x14b734f3725f] 1 /usr/lib64/libucs.so.0(+0x1b42a) [0x14b734f3742a] 2 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x4482df] 3 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x40d1c5] 4 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x42dd6e] 5 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404ded] 6 /usr/lib64/libc.so.6(__libc_start_main+0xf3) [0x14b7365451a3] 7 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404e1e] =================== Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. mpirun noticed that process rank 1 with PID 0 on node node0 exited on signal 11 (Segmentation fault). [1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop -------------------------------------------------------------------------------------------------------------------------------- Dayfile of the case: Calculating Testsession in /home/mpiuser/WIEN2k/Testsession on node0 with PID 1423240 using WIEN2k_19.1 (Release 25/6/2019) in /home/mpiuser/WIEN2k-19.1 start (Mon 20 Apr 2020 01:52:09 PM CEST) with lapw0 (40/99 to go) cycle 1 (Mon 20 Apr 2020 01:52:09 PM CEST) (40/99 to go) > lapw0 -p (13:52:09) starting parallel lapw0 at Mon 20 Apr 2020 01:52:09 > PM CEST -------- .machine0 : 2 processors 1.028u 0.157s 0:02.41 48.5% 0+0k 0+496io 0pf+0w > lapw1 -p (13:52:11) starting parallel lapw1 at Mon 20 Apr 2020 > 01:52:11 PM CEST -> starting parallel LAPW1 jobs at Mon 20 Apr 2020 01:52:11 PM CEST running LAPW1 in parallel mode (using .machines) 1 number_of_parallel_jobs node0 node1(72) 0.100u 0.089s 0:01.03 17.4% 0+0k 0+8io 0pf+0w Summary of lapw1para: node0 k=0 user=72 wallclock=5.34 ** LAPW1 crashed! 0.178u 0.148s 0:02.21 14.0% 0+0k 0+136io 0pf+0w error: command /home/mpiuser/WIEN2k-19.1/lapw1para lapw1.def failed > stop error Parallel_Options: setenv TASKSET "no" if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1 if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DELAY 0.1 setenv SLEEPY 1 setenv WIEN_MPIRUN "/usr/lib64/openmpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -np _NP_ -machinefile _HOSTS_ _EXEC_" setenv CORES_PER_NODE 1 .machines file: 1:node0:1 node1:1 lapw0:node0:1 node1:1 granularity:1 Help would be greatly appreciated. Best Regards André Deyerling
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html