Dear Nick,

I've tried Divide-and-Conquer, Expert and QR, they all fail with the same backtrace.
I couldn't compile MRRR, I think my ScaLAPACK misses some routines.

But, following your idea about a bug in ScaLAPACK, I recompiled Siesta with the MKL libraries from the Debian-9 repo. They are from 2019, not that old. It also failed for Divide-and-Conquer, but MRRR, Expert and QR work fine. I think it's enough for my purposes, I'll use MRRR. But I don't know what is wrong with D&C. I attach the output of the D&C with MKL, maybe you find it useful.

Thank you for help!

Best,
Karen

On 6/30/21 11:25 AM, Nick Papior wrote:
I have now tried to rerun it with 4.1.
And I get no error, even in debug mode.

My bet is that the scalapack library is an old and buggy one. But I could be wrong.

Could you rerun with the different possibilities for Diag.Algorithm?
I.e. try them all and see which ones works, and which doesn't, then report back.

Den ons. 30. jun. 2021 kl. 11.16 skrev Karen Fidanyan <karen.fidan...@mpsd.mpg.de <mailto:karen.fidan...@mpsd.mpg.de>>:

    Dear Nick,

    thanks for helping!

    I redid it with -Og flag. The input, *.psf and the output are
    attached. I also attach debug.* files obtained with -DDEBUG.
    I run as `mpirun -np 2 ~/soft/siesta-4.1/Obj-dbg-Og/siesta
    control.fdf  2>&1 | tee siesta.out`.

    Sincerely,
    Karen Fidanyan

    On 6/28/21 10:22 PM, Nick Papior wrote:
    I can't rerun without psf files.

    Could you try and compile with -Og -g -fbacktrace (without
    fcheck=all).

    Then try again. :)

    Den man. 28. jun. 2021 kl. 22.01 skrev Karen Fidanyan
    <karen.fidan...@mpsd.mpg.de <mailto:karen.fidan...@mpsd.mpg.de>>:

        Dear Siesta users,

        I'm having a hard time trying to run SIESTA on my Debian-9
        laptop.
        I have:

        GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
        OpenMPI-2.0.2-2
        libblas 3.7.0-2, liblapack 3.7.0-2
        libscalapack-openmpi1 1.8.0-13

        My arch.make is the following:
        **********************************************************************
        .SUFFIXES:
        .SUFFIXES: .f .F .o .a .f90 .F90 .c

        SIESTA_ARCH = gfortran_openMPI

        FPP = $(FC) -E -P -x c
        FC = mpifort
        FC_SERIAL = gfortran
        FFLAGS = -O0 -g -fbacktrace -fcheck=all #-Wall
        FFLAGS_DEBUG = -g -O0

        PP = gcc -E -P -C
        CC = gcc
        CFLAGS = -O0 -g -Wall

        AR = ar
        RANLIB = ranlib
        SYS = nag

        LDFLAGS = -static-libgcc -ldl

        BLASLAPACK_LIBS = -llapack  -lblas \
                             -lscalapack-openmpi -lblacs-openmpi
        -lblacsF77init-openmpi \
                             -lblacsCinit-openmpi \
                             -lpthread -lm

        MPI_INTERFACE = libmpi_f90.a
        MPI_INCLUDE   = .

        FPPFLAGS_MPI = -DMPI -DMPI_TIMING -D_DIAG_WORK
        FPPFLAGS = $(DEFS_PREFIX) -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
        $(FPPFLAGS_MPI)

        INCFLAGS = $(MPI_INCLUDE)

        LIBS = $(BLASLAPACK_LIBS) $(MPI_LIBS)

        atom.o: atom.F
             $(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) $(FPPFLAGS)
        $(FPPFLAGS_fixed_F) $<


        .c.o:
             $(CC) -c $(CFLAGS) $(INCFLAGS) $(CPPFLAGS) $<
        .F.o:
             $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS)
        $(FPPFLAGS_fixed_F)  $<
        .F90.o:
             $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS)
        $(FPPFLAGS_free_F90) $<
        .f.o:
             $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_fixed_f)  $<
        .f90.o:
             $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_free_f90)  $<
        **********************************************************************

        The code compiles without errors.
        If I run with Diag.ParallelOverK  True, I can run on multiple
        cores, no
        errors.
        With Diag.ParallelOverK  False, I can run `mpirun -np 1`
        without errors,
        but if I try to use >=2 cores, it fails with:
        **********************************************************************
        Program received signal SIGSEGV: Segmentation fault - invalid
        memory
        reference.

        Backtrace for this error:
        #0  0x2ba6eb754d1d in ???
        #1  0x2ba6eb753f7d in ???
        #2  0x2ba6ec95405f in ???
        #3  0x2ba70ec1cd8c in ???
        #4  0x2ba6eab438a4 in ???
        #5  0x2ba6eab44336 in ???
        #6  0x563b3f1cfead in __m_diag_MOD_diag_c
             at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:709
        #7  0x563b3f1d2ef9 in cdiag_
             at /home/fidanyan/soft/siesta-4.1/Src/diag.F90:2253
        #8  0x563b3ebc7c8d in diagk_
             at /home/fidanyan/soft/siesta-4.1/Src/diagk.F:195
        #9  0x563b3eb9d714 in __m_diagon_MOD_diagon
             at /home/fidanyan/soft/siesta-4.1/Src/diagon.F:265
        #10  0x563b3ed897cb in __m_compute_dm_MOD_compute_dm
             at /home/fidanyan/soft/siesta-4.1/Src/compute_dm.F:172
        #11  0x563b3edbfaa5 in __m_siesta_forces_MOD_siesta_forces
             at /home/fidanyan/soft/siesta-4.1/Src/siesta_forces.F:315
        #12  0x563b3f9a4005 in siesta
             at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:73
        #13  0x563b3f9a408a in main
             at /home/fidanyan/soft/siesta-4.1/Src/siesta.F:10
        
--------------------------------------------------------------------------
        mpirun noticed that process rank 0 with PID 0 on node
        fenugreek exited
        on signal 11 (Segmentation fault).
        **********************************************************************

        I ran it by
        `mpirun -np 2 ~/soft/siesta-4.1/Obj-debug-O0/siesta
        control.fdf | tee
        siesta.out`

        The header of the broken calculation:
        
--------------------------------------------------------------------------------------------


        Siesta Version  : v4.1.5-1-g384057250
        Architecture    : gfortran_openMPI
        Compiler version: GNU Fortran (Debian 6.3.0-18+deb9u1) 6.3.0
        20170516
        Compiler flags  : mpifort -O0 -g -fbacktrace -fcheck=all
        PP flags        :  -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DMPI
        -DMPI_TIMING
        -D_DIAG_WORK
        Libraries       :  -llapack -lblas -lscalapack-openmpi
        -lblacs-openmpi
        -lblacsF77init-openmpi -lblacsCinit-openmpi -lpthread -lm
        PARALLEL version

        * Running on 2 nodes in parallel
        
--------------------------------------------------------------------------------------------



        I also attach the fdf file and the full output with an error.
        Do you have an idea what is wrong?

        Sincerely,
        Karen Fidanyan
        PhD student
        Max Planck Institute for the Structure and Dynamics of Matter
        Hamburg, Germany


-- SIESTA is supported by the Spanish Research Agency (AEI) and
        by the European H2020 MaX Centre of Excellence
        (http://www.max-centre.eu/)



-- Kind regards Nick



--
Kind regards Nick

Attachment: run-mkl-DivConq.tgz
Description: application/compressed-tar

-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a