Dear Nick,I've tried Divide-and-Conquer, Expert and QR, they all fail with the same backtrace.
I couldn't compile MRRR, I think my ScaLAPACK misses some routines.
But, following your idea about a bug in ScaLAPACK, I recompiled Siesta with the MKL libraries from the Debian-9 repo. They are from 2019, not that old. It also failed for Divide-and-Conquer, but MRRR, Expert and QR work fine. I think it's enough for my purposes, I'll use MRRR. But I don't know what is wrong with D&C. I attach the output of the D&C with MKL, maybe you find it useful.
Thank you for help! Best, Karen On 6/30/21 11:25 AM, Nick Papior wrote:
I have now tried to rerun it with 4.1. And I get no error, even in debug mode.My bet is that the scalapack library is an old and buggy one. But I could be wrong.Could you rerun with the different possibilities for Diag.Algorithm?I.e. try them all and see which ones works, and which doesn't, then report back.Den ons. 30. jun. 2021 kl. 11.16 skrev Karen Fidanyan <karen.fidan...@mpsd.mpg.de <mailto:karen.fidan...@mpsd.mpg.de>>:Dear Nick, thanks for helping! I redid it with -Og flag. The input, *.psf and the output are attached. I also attach debug.* files obtained with -DDEBUG. I run as `mpirun -np 2 ~/soft/siesta-4.1/Obj-dbg-Og/siesta control.fdf 2>&1 | tee siesta.out`. Sincerely, Karen Fidanyan On 6/28/21 10:22 PM, Nick Papior wrote:I can't rerun without psf files. Could you try and compile with -Og -g -fbacktrace (without fcheck=all). Then try again. :) Den man. 28. jun. 2021 kl. 22.01 skrev Karen Fidanyan <karen.fidan...@mpsd.mpg.de <mailto:karen.fidan...@mpsd.mpg.de>>: Dear Siesta users, I'm having a hard time trying to run SIESTA on my Debian-9 laptop. I have: GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 OpenMPI-2.0.2-2 libblas 3.7.0-2, liblapack 3.7.0-2 libscalapack-openmpi1 1.8.0-13 My arch.make is the following: ********************************************************************** .SUFFIXES: .SUFFIXES: .f .F .o .a .f90 .F90 .c SIESTA_ARCH = gfortran_openMPI FPP = $(FC) -E -P -x c FC = mpifort FC_SERIAL = gfortran FFLAGS = -O0 -g -fbacktrace -fcheck=all #-Wall FFLAGS_DEBUG = -g -O0 PP = gcc -E -P -C CC = gcc CFLAGS = -O0 -g -Wall AR = ar RANLIB = ranlib SYS = nag LDFLAGS = -static-libgcc -ldl BLASLAPACK_LIBS = -llapack -lblas \ -lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi \ -lblacsCinit-openmpi \ -lpthread -lm MPI_INTERFACE = libmpi_f90.a MPI_INCLUDE = . FPPFLAGS_MPI = -DMPI -DMPI_TIMING -D_DIAG_WORK FPPFLAGS = $(DEFS_PREFIX) -DFC_HAVE_FLUSH -DFC_HAVE_ABORT $(FPPFLAGS_MPI) INCFLAGS = $(MPI_INCLUDE) LIBS = $(BLASLAPACK_LIBS) $(MPI_LIBS) atom.o: atom.F $(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_fixed_F) $< .c.o: $(CC) -c $(CFLAGS) $(INCFLAGS) $(CPPFLAGS) $< .F.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_fixed_F) $< .F90.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_free_F90) $< .f.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_fixed_f) $< .f90.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_free_f90) $< ********************************************************************** The code compiles without errors. If I run with Diag.ParallelOverK True, I can run on multiple cores, no errors. With Diag.ParallelOverK False, I can run `mpirun -np 1` without errors, but if I try to use >=2 cores, it fails with: ********************************************************************** Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x2ba6eb754d1d in ??? #1 0x2ba6eb753f7d in ??? #2 0x2ba6ec95405f in ??? #3 0x2ba70ec1cd8c in ??? #4 0x2ba6eab438a4 in ??? #5 0x2ba6eab44336 in ??? #6 0x563b3f1cfead in __m_diag_MOD_diag_c at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:709 #7 0x563b3f1d2ef9 in cdiag_ at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:2253 #8 0x563b3ebc7c8d in diagk_ at /home/fidanyan/soft/siesta-4.1/Src/diagk.F:195 #9 0x563b3eb9d714 in __m_diagon_MOD_diagon at /home/fidanyan/soft/siesta-4.1/Src/diagon.F:265 #10 0x563b3ed897cb in __m_compute_dm_MOD_compute_dm at /home/fidanyan/soft/siesta-4.1/Src/compute_dm.F:172 #11 0x563b3edbfaa5 in __m_siesta_forces_MOD_siesta_forces at /home/fidanyan/soft/siesta-4.1/Src/siesta_forces.F:315 #12 0x563b3f9a4005 in siesta at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:73 #13 0x563b3f9a408a in main at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:10 -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node fenugreek exited on signal 11 (Segmentation fault). ********************************************************************** I ran it by `mpirun -np 2 ~/soft/siesta-4.1/Obj-debug-O0/siesta control.fdf | tee siesta.out` The header of the broken calculation: -------------------------------------------------------------------------------------------- Siesta Version : v4.1.5-1-g384057250 Architecture : gfortran_openMPI Compiler version: GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Compiler flags : mpifort -O0 -g -fbacktrace -fcheck=all PP flags : -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DMPI -DMPI_TIMING -D_DIAG_WORK Libraries : -llapack -lblas -lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi -lblacsCinit-openmpi -lpthread -lm PARALLEL version * Running on 2 nodes in parallel -------------------------------------------------------------------------------------------- I also attach the fdf file and the full output with an error. Do you have an idea what is wrong? Sincerely, Karen Fidanyan PhD student Max Planck Institute for the Structure and Dynamics of Matter Hamburg, Germany-- SIESTA is supported by the Spanish Research Agency (AEI) andby the European H2020 MaX Centre of Excellence (http://www.max-centre.eu/)-- Kind regards Nick-- Kind regards Nick
run-mkl-DivConq.tgz
Description: application/compressed-tar
-- SIESTA is supported by the Spanish Research Agency (AEI) and by the European H2020 MaX Centre of Excellence (http://www.max-centre.eu/)