Just to check, do you already have your WIEN2k 23.2 lapw0 built with gfortran patched with the fix at:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg23272.html

The gfortran compiler on Linux needed the patch, but the Intel ifort seemed to work fine without the patch.  However, I don't know about gfortran on MAC as I don't have a system with the macOS.

It's unfortunate that only the disk image (.dmg) for the older Lion 10.7 to Sierra 10.12 are web browser downloadable from [1]. Since, running of your particular MacOS might have been possible in a virtual machine on Windows or Linux with a disk image.  The MacOS version you are probably using is likely among the newer versions that look to be only downloadable from the App Store using a compatible Mac.  Meaning access to a .dmg is not available to us non-MAC users for trying to help troubleshot the issue.

[1] https://support.apple.com/en-us/102662

Kind Regards,
Gavin
WIEN2k user

On 6/23/2024 10:24 PM, Yichen Zhang wrote:
Dear Laurence and Peter,

1) No, I did not run with omp. The above discussions in threads are all in 
sequential mode (no -p). However, indeed I have tested dstart and lapw0 in 
parallel mode, where lapw0 hangs similarly like in serial mode and dstart 
parallel mode runs fine. Just in case, I attach below one version of my 
.machines file when I ran dstart in sequential but lapw0 in parallel mode with 
2 processors:
***********
#dstart:localhost localhost
speed:localhost localhost
lapw0:localhost localhost

1:localhost
1:localhost
granularity:1
extrafine:1

omp_global:16
***********
And of course, I never made it to lapw1, due to the lapw0 hanging issue.

2) Through inserting a bunch of PRINT *, “BREAKPOINT1,2,3,…”, the exact line of 
the where the programme hangs has been determined. In the output of “time lapw0 
lapw0.def”, it hangs exactly at CALL XCPOT1(luse2,LM,…). The context in lapw0.F 
is:
***********
if (.not.xcpot1qq) then
   PRINT *, “BREAKPOINT13”
   CALL XCPOT1(luse2,LM,…)
   PRINT *, “BREAKPOINT14”
***********
BREAKPOINT13 is the last printed out. 14 is not printed. Importantly, no any 
BREAKPOINT within the subroutine XCPOT1 is printed. The first “BREAKPOINT” in 
XCPOT1 is at the earliest legit position after all the USE, IMPLICIT NONE, and 
parameters declaration. It doesn’t get printed. That seems to tell XCPOT1 is 
called but never runs, so the code hangs after “BREAKPOINT13” and never prints 
out the BREAKPOINTs in XCPOT1 or BREAKPOINT14.
I don’t understand why, considering XCPOT1 subroutine seems legit and compiled 
fine...

3) My last resort was to ask ChatGPT why subroutines can hang, it suggested 7 
possibilities from programming level to system level. And I provide some of my 
guess and questions on these possibilities.
  a) Infinite loops. I have checked all DO loops in XCPOT1.f, but all loops are 
closed. If there is any, compiler should have found that. So NO.
  b) Large memory allocation. There is no large array allocation in XCPOT1, 
despite three dynamic allocations. So NOT likely.
  c) Recursion without proper termination. NO. XCPOT1 is not a recursive 
subroutine.
  d) Blocking I/O operations. NO. It was not waiting for user input or reading 
from a slow device.
  e) Incorrect use of pointers. NO. I didn’t find pointers in XCPOT1.
  f) Stack overflow. No. Again, I didn’t see any recursion or large arrays. The 
three dynamic allocatables seem small.
  g) Deadlocks. This is the part I don’t quite understand if it could happen, 
but my guess is no. Even though I run lapw0 in sequential mode, could circular 
dependency between tasks still happen when the programme runs on an Apple 
silicon Mac system?

This is where the problem is stuck at the moment, unfortunately.


Best regards
Yichen
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to