Dear siesta users,
I'm running some calculations using a parallel version of siesta-psml,
and I'm finding that some of them are crashing during the SCF loop.
Things seem to go fine, and then I get a message like
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[48394,1],3]
Exit code: 174
--------------------------------------------------------------------------
but it isn't apparent from the output file why the crash occured.
Usually I just re-run the jobs and eventually they finish without
crashing, but I'm working on a large set of calculations and I'd rather
not have to babysit them all. Has anyone else had a similar experience,
or does anyone have any suggestions on how I could find what is causing
the problem? I suspect it could be to do with memory, mainly because
there aren't any other signs of anything else going wrong.
Thanks in advance,
Danny Bennett
--
SIESTA is supported by the Spanish Research Agency (AEI) and by the European
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)