Dear All,

I am calculating a slab of FASnI3 (972 atoms), 45 Ang thick plus 50 Ang of
vacuum with this siesta version

>Siesta Version  : v4.1-b4
>Architecture    : unknown
>Compiler version: GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
>Compiler flags  : mpif90 -O2 -fPIC -ftree-vectorize
>PP flags        : -DFC_HAVE_ABORT -DMPI -DSIESTA__DIAG_2STAGE
>Libraries       : libsiestaLAPACK.a libsiestaBLAS.a
/opt/share/scalapack_gnu/libscalapack.a
>PARALLEL version

>* Running on 24 nodes in parallel


This is the initial .fdf I use and that, mutatis mutandis, works perfectly
for the bulk calculation (12 atoms):

SystemName     surf010
SystemLabel      surf010
NumberOfSpecies      5
NumberOfAtoms       972
XC.functional  VDW
XC.authors     VV

%block ChemicalSpeciesLabel
1   50  Sn
2   53  IGGAtm2
3    6  C
4    7  N
5    1  H
%endblock ChemicalSpeciesLabel

LatticeConstant      1.00 Ang
%block LatticeVectors
      18.633         0.              0.
      0.              94.5837        0.
      0.              0.              25.8042
%endblock LatticeVectors

MeshCutoff             300  Ry
ElectronicTemperature  1500 K
UseSaveData       .true.
MD.UseSaveXV    .true.

DM.MixingWeight      0.02! 0.2
DM.NumberPulay       9 !3
DM.Tolerance         1.d-3 #1.d-5

SolutionMethod       diagon

WriteDM           .true.
WriteEigenvalues  .true.
MaxSCFIterations     1000

PAO.BasisSize TZP
PAO.EnergyShift  0.01 Ry

MD.TypeOfRun      CG

MD.VariableCell   .false.
MD.RelaxCellOnly  .false.
MD.MaxForceTol     0.04 ev/Ang  ! 0.02 ev/Ang
MD.MaxStressTol    0.1 GPa
MD.NumCGsteps      3000
MD.FCDispl         0.03 Bohr

WriteCoorXmol      .true.
WriteCoorMDXmol    .true.
WriteMDXmol     .true.

%block kgrid_Monkhorst_Pack
  1  0  0    0.
  0  1  0    0.
  0  0  1    0.
%endblock kgrid_Monkhorst_Pack


AtomicCoordinatesFormat Ang

WriteCoorCerius .true.


%block AtomicCoordinatesAndAtomicSpecies
...
...
...
%endblock AtomicCoordinatesAndAtomicSpecies

And this the error I get immediately after launching the job

Setting up quadratic distribution...
ExtMesh (bp) on 0 =   154 x   136 x   151 =     3162544
PhiOnMesh: Number of (b)points on node 0 =               194568
PhiOnMesh: nlist on node 0 =              7689058
        1637
rdiag: Error in Cholesky factorisation
Stopping Program from Node:   23
        1637
rdiag: Error in Cholesky factorisation
Stopping Program from Node:    1
        1637
rdiag: Error in Cholesky factorisation
Stopping Program from Node:    2
        1637
rdiag: Error in Cholesky factorisation
Stopping Program from Node:    3
        1637
rdiag: Error in Cholesky factorisation
Stopping Program from Node:    4
        1637
rdiag: Error in Cholesky factorisation
...
...
Stopping Program from Node:   10
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 18 in communicator MPI COMMUNICATOR 3 CREATE
FROM 0
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at
line 2079
[node02:146174] 23 more processes have sent help message help-mpi-api.txt /
mpi-abort
[node02:146174] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


I have tried all the possible solutions (Divide&Conquer,ParalleloverY).
The geometry is OK. Triple check. Everything worked perfectly for the bulk
case, as said...

My SysAdmin recommended  me to contact you before proceeding since you
probably can better point out the origin of the issue.

Thanks in advance for your help/hints.

Best regards,
Giacomo Giorgi

P.S.: One further comment. I have noticed that in this version "UseSaveData
      .true." is not sufficient per se to restart from previous XV file (It
only reads .DM file)  and that "MD.UseSaveXV    .true." must be specified
as well to make the geometry restart form previous XV file.


-- 

"Oltre le illusioni di Timbuctù e le gambe lunghe di Babalù c'era questa
strada...Questa strada zitta che vola via come una farfalla, una nostalgia,
nostalgia al gusto di curaçao...Forse un giorno meglio mi spiegherò"

(Paolo Conte, "Hemingway")
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a