Dear siesta users,

I'm running some calculations using a parallel version of siesta-psml, and I'm finding that some of them are crashing during the SCF loop. Things seem to go fine, and then I get a message like

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[48394,1],3]
  Exit code:    174
--------------------------------------------------------------------------

but it isn't apparent from the output file why the crash occured. Usually I just re-run the jobs and eventually they finish without crashing, but I'm working on a large set of calculations and I'd rather not have to babysit them all. Has anyone else had a similar experience, or does anyone have any suggestions on how I could find what is causing the problem? I suspect it could be to do with memory, mainly because there aren't any other signs of anything else going wrong.

Thanks in advance,

Danny Bennett
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a