Dear developers and users,
I tried to run GPU version of QE for electron phonon coupling calculation on an
a100 card. The structure relaxation and self-consistent calculation are
successful. However, when I did phonon calculation, my job crashed with a
strange error:
##################
[m005:65520:0:65520] Caught signal 11 (Segmentation fault: address not mapped
to object at address 0xfffffffffffffffc)
/fs08/home/js_luqing/src/qe-7.2/PHonon/PH/phq_setup.f90: [ phq_setup_() ]
...
322 ! nat_todo, atomo,
comp_irr
323
324 DO irr=0,nirr
==> 325
comp_irr(irr)=comp_irr_iq(irr,current_iq)
326 IF (elph .AND. irr>0)
comp_elph(irr)=comp_irr(irr)
327 ENDDO
328 !
==== backtrace (tid: 65520) ====
0 0x00000000004a2780 phq_setup_()
/fs08/home/js_luqing/src/qe-7.2/PHonon/PH/phq_setup.f90:325
1 0x00000000004700e1 initialize_ph_()
/fs08/home/js_luqing/src/qe-7.2/PHonon/PH/initialize_ph.f90:79
2 0x000000000041a811 do_phonon_()
/fs08/home/js_luqing/src/qe-7.2/PHonon/PH/do_phonon.f90:100
3 0x0000000000413d25 MAIN_()
/fs08/home/js_luqing/src/qe-7.2/PHonon/PH/phonon.f90:78
4 0x0000000000413c71 main() ???:0
5 0x0000000000022555 __libc_start_main() ???:0
6 0x000000000040cd8d _start() ???:0
=================================
[m005:65520] *** Process received signal ***
[m005:65520] Signal: Segmentation fault (11)
[m005:65520] Signal code: (-6)
[m005:65520] Failing at address: 0x6a80000fff0
[m005:65520] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2ac4edf09630]
[m005:65520] [ 1] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x4a2780]
[m005:65520] [ 2] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x4700e1]
[m005:65520] [ 3] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x41a811]
[m005:65520] [ 4] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x413d25]
[m005:65520] [ 5] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x413c71]
[m005:65520] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ac4ee9e7555]
[m005:65520] [ 7] /fs08/home/js_luqing/src/qe-7.2/bin/ph.x[0x40cd8d]
[m005:65520] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node m005 exited on signal 11
(Segmentation fault).
#########################
I am not an expert of coding, but it seems like the line 325 wasn't recognized,
which is fairly strange. I don't know how to solve this problem, and I am glad
if anyone can help me.
Yours,
Qing Lu
lq1998
1148330...@qq.com
_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users