Just to check, do you already have your WIEN2k 23.2 lapw0 built with
gfortran patched with the fix at:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg23272.html
The gfortran compiler on Linux needed the patch, but the Intel ifort
seemed to work fine without the patch. However, I don't know about
gfortran on MAC as I don't have a system with the macOS.
It's unfortunate that only the disk image (.dmg) for the older Lion 10.7
to Sierra 10.12 are web browser downloadable from [1]. Since, running of
your particular MacOS might have been possible in a virtual machine on
Windows or Linux with a disk image. The MacOS version you are probably
using is likely among the newer versions that look to be only
downloadable from the App Store using a compatible Mac. Meaning access
to a .dmg is not available to us non-MAC users for trying to help
troubleshot the issue.
[1] https://support.apple.com/en-us/102662
Kind Regards,
Gavin
WIEN2k user
On 6/23/2024 10:24 PM, Yichen Zhang wrote:
Dear Laurence and Peter,
1) No, I did not run with omp. The above discussions in threads are all in
sequential mode (no -p). However, indeed I have tested dstart and lapw0 in
parallel mode, where lapw0 hangs similarly like in serial mode and dstart
parallel mode runs fine. Just in case, I attach below one version of my
.machines file when I ran dstart in sequential but lapw0 in parallel mode with
2 processors:
***********
#dstart:localhost localhost
speed:localhost localhost
lapw0:localhost localhost
1:localhost
1:localhost
granularity:1
extrafine:1
omp_global:16
***********
And of course, I never made it to lapw1, due to the lapw0 hanging issue.
2) Through inserting a bunch of PRINT *, “BREAKPOINT1,2,3,…”, the exact line of
the where the programme hangs has been determined. In the output of “time lapw0
lapw0.def”, it hangs exactly at CALL XCPOT1(luse2,LM,…). The context in lapw0.F
is:
***********
if (.not.xcpot1qq) then
PRINT *, “BREAKPOINT13”
CALL XCPOT1(luse2,LM,…)
PRINT *, “BREAKPOINT14”
***********
BREAKPOINT13 is the last printed out. 14 is not printed. Importantly, no any
BREAKPOINT within the subroutine XCPOT1 is printed. The first “BREAKPOINT” in
XCPOT1 is at the earliest legit position after all the USE, IMPLICIT NONE, and
parameters declaration. It doesn’t get printed. That seems to tell XCPOT1 is
called but never runs, so the code hangs after “BREAKPOINT13” and never prints
out the BREAKPOINTs in XCPOT1 or BREAKPOINT14.
I don’t understand why, considering XCPOT1 subroutine seems legit and compiled
fine...
3) My last resort was to ask ChatGPT why subroutines can hang, it suggested 7
possibilities from programming level to system level. And I provide some of my
guess and questions on these possibilities.
a) Infinite loops. I have checked all DO loops in XCPOT1.f, but all loops are
closed. If there is any, compiler should have found that. So NO.
b) Large memory allocation. There is no large array allocation in XCPOT1,
despite three dynamic allocations. So NOT likely.
c) Recursion without proper termination. NO. XCPOT1 is not a recursive
subroutine.
d) Blocking I/O operations. NO. It was not waiting for user input or reading
from a slow device.
e) Incorrect use of pointers. NO. I didn’t find pointers in XCPOT1.
f) Stack overflow. No. Again, I didn’t see any recursion or large arrays. The
three dynamic allocatables seem small.
g) Deadlocks. This is the part I don’t quite understand if it could happen,
but my guess is no. Even though I run lapw0 in sequential mode, could circular
dependency between tasks still happen when the programme runs on an Apple
silicon Mac system?
This is where the problem is stuck at the moment, unfortunately.
Best regards
Yichen
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html