Hmm... That's quite annoying... :(

Thanks for reporting back!

Den ons. 30. jun. 2021 kl. 15.39 skrev Karen Fidanyan <
karen.fidan...@mpsd.mpg.de>:

> Dear Nick,
>
> I've tried Divide-and-Conquer, Expert and QR, they all fail with the same
> backtrace.
> I couldn't compile MRRR, I think my ScaLAPACK misses some routines.
>
> But, following your idea about a bug in ScaLAPACK, I recompiled Siesta
> with the MKL libraries from the Debian-9 repo. They are from 2019, not that
> old.
> It also failed for Divide-and-Conquer, but MRRR, Expert and QR work fine.
> I think it's enough for my purposes, I'll use MRRR. But I don't know what
> is wrong with D&C. I attach the output of the D&C with MKL, maybe you find
> it useful.
>
> Thank you for help!
>
> Best,
> Karen
> On 6/30/21 11:25 AM, Nick Papior wrote:
>
> I have now tried to rerun it with 4.1.
> And I get no error, even in debug mode.
>
> My bet is that the scalapack library is an old and buggy one. But I could
> be wrong.
>
> Could you rerun with the different possibilities for Diag.Algorithm?
> I.e. try them all and see which ones works, and which doesn't, then report
> back.
>
> Den ons. 30. jun. 2021 kl. 11.16 skrev Karen Fidanyan <
> karen.fidan...@mpsd.mpg.de>:
>
>> Dear Nick,
>>
>> thanks for helping!
>>
>> I redid it with -Og flag. The input, *.psf and the output are attached. I
>> also attach debug.* files obtained with -DDEBUG.
>> I run as `mpirun -np 2 ~/soft/siesta-4.1/Obj-dbg-Og/siesta control.fdf
>> 2>&1 | tee siesta.out`.
>>
>> Sincerely,
>> Karen Fidanyan
>> On 6/28/21 10:22 PM, Nick Papior wrote:
>>
>> I can't rerun without psf files.
>>
>> Could you try and compile with -Og -g -fbacktrace (without fcheck=all).
>>
>> Then try again. :)
>>
>> Den man. 28. jun. 2021 kl. 22.01 skrev Karen Fidanyan <
>> karen.fidan...@mpsd.mpg.de>:
>>
>>> Dear Siesta users,
>>>
>>> I'm having a hard time trying to run SIESTA on my Debian-9 laptop.
>>> I have:
>>>
>>> GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
>>> OpenMPI-2.0.2-2
>>> libblas 3.7.0-2, liblapack 3.7.0-2
>>> libscalapack-openmpi1 1.8.0-13
>>>
>>> My arch.make is the following:
>>> **********************************************************************
>>> .SUFFIXES:
>>> .SUFFIXES: .f .F .o .a .f90 .F90 .c
>>>
>>> SIESTA_ARCH = gfortran_openMPI
>>>
>>> FPP = $(FC) -E -P -x c
>>> FC = mpifort
>>> FC_SERIAL = gfortran
>>> FFLAGS = -O0 -g -fbacktrace -fcheck=all #-Wall
>>> FFLAGS_DEBUG = -g -O0
>>>
>>> PP = gcc -E -P -C
>>> CC = gcc
>>> CFLAGS = -O0 -g -Wall
>>>
>>> AR = ar
>>> RANLIB = ranlib
>>> SYS = nag
>>>
>>> LDFLAGS = -static-libgcc -ldl
>>>
>>> BLASLAPACK_LIBS = -llapack  -lblas \
>>>                      -lscalapack-openmpi -lblacs-openmpi
>>> -lblacsF77init-openmpi \
>>>                      -lblacsCinit-openmpi \
>>>                      -lpthread -lm
>>>
>>> MPI_INTERFACE = libmpi_f90.a
>>> MPI_INCLUDE   = .
>>>
>>> FPPFLAGS_MPI = -DMPI -DMPI_TIMING -D_DIAG_WORK
>>> FPPFLAGS = $(DEFS_PREFIX) -DFC_HAVE_FLUSH -DFC_HAVE_ABORT $(FPPFLAGS_MPI)
>>>
>>> INCFLAGS = $(MPI_INCLUDE)
>>>
>>> LIBS = $(BLASLAPACK_LIBS) $(MPI_LIBS)
>>>
>>> atom.o: atom.F
>>>      $(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) $(FPPFLAGS)
>>> $(FPPFLAGS_fixed_F) $<
>>>
>>>
>>> .c.o:
>>>      $(CC) -c $(CFLAGS) $(INCFLAGS) $(CPPFLAGS) $<
>>> .F.o:
>>>      $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_fixed_F)  $<
>>> .F90.o:
>>>      $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_free_F90) $<
>>> .f.o:
>>>      $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_fixed_f)  $<
>>> .f90.o:
>>>      $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_free_f90)  $<
>>> **********************************************************************
>>>
>>> The code compiles without errors.
>>> If I run with Diag.ParallelOverK  True, I can run on multiple cores, no
>>> errors.
>>> With Diag.ParallelOverK  False, I can run `mpirun -np 1` without errors,
>>> but if I try to use >=2 cores, it fails with:
>>> **********************************************************************
>>> Program received signal SIGSEGV: Segmentation fault - invalid memory
>>> reference.
>>>
>>> Backtrace for this error:
>>> #0  0x2ba6eb754d1d in ???
>>> #1  0x2ba6eb753f7d in ???
>>> #2  0x2ba6ec95405f in ???
>>> #3  0x2ba70ec1cd8c in ???
>>> #4  0x2ba6eab438a4 in ???
>>> #5  0x2ba6eab44336 in ???
>>> #6  0x563b3f1cfead in __m_diag_MOD_diag_c
>>>      at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:709
>>> #7  0x563b3f1d2ef9 in cdiag_
>>>      at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:2253
>>> #8  0x563b3ebc7c8d in diagk_
>>>      at /home/fidanyan/soft/siesta-4.1/Src/diagk.F:195
>>> #9  0x563b3eb9d714 in __m_diagon_MOD_diagon
>>>      at /home/fidanyan/soft/siesta-4.1/Src/diagon.F:265
>>> #10  0x563b3ed897cb in __m_compute_dm_MOD_compute_dm
>>>      at /home/fidanyan/soft/siesta-4.1/Src/compute_dm.F:172
>>> #11  0x563b3edbfaa5 in __m_siesta_forces_MOD_siesta_forces
>>>      at /home/fidanyan/soft/siesta-4.1/Src/siesta_forces.F:315
>>> #12  0x563b3f9a4005 in siesta
>>>      at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:73
>>> #13  0x563b3f9a408a in main
>>>      at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:10
>>>
>>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 0 with PID 0 on node fenugreek exited
>>> on signal 11 (Segmentation fault).
>>> **********************************************************************
>>>
>>> I ran it by
>>> `mpirun -np 2 ~/soft/siesta-4.1/Obj-debug-O0/siesta control.fdf | tee
>>> siesta.out`
>>>
>>> The header of the broken calculation:
>>> --------------------------------------------------------------------------------------------
>>>
>>>
>>> Siesta Version  : v4.1.5-1-g384057250
>>> Architecture    : gfortran_openMPI
>>> Compiler version: GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
>>> Compiler flags  : mpifort -O0 -g -fbacktrace -fcheck=all
>>> PP flags        :  -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DMPI -DMPI_TIMING
>>> -D_DIAG_WORK
>>> Libraries       :  -llapack -lblas -lscalapack-openmpi -lblacs-openmpi
>>> -lblacsF77init-openmpi -lblacsCinit-openmpi -lpthread -lm
>>> PARALLEL version
>>>
>>> * Running on 2 nodes in parallel
>>> --------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> I also attach the fdf file and the full output with an error.
>>> Do you have an idea what is wrong?
>>>
>>> Sincerely,
>>> Karen Fidanyan
>>> PhD student
>>> Max Planck Institute for the Structure and Dynamics of Matter
>>> Hamburg, Germany
>>>
>>>
>>> --
>>> SIESTA is supported by the Spanish Research Agency (AEI) and by the
>>> European H2020 MaX Centre of Excellence (http://www.max-centre.eu/)
>>>
>>
>>
>> --
>> Kind regards Nick
>>
>>
>
> --
> Kind regards Nick
>
>

-- 
Kind regards Nick
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a