Hi.
On 10/15/2015 09:15 AM, Nick Papior wrote:
This may or may not be due to your local computer killing your
application for various reasons. We simply do not know.
As this hasn't been reported by any-one else I suspect you are:
1) Going beyond your time-limit (is it a cluster with an admin? Have you
asked him/her first?)
2) Perhaps you are calling your executable erroneously
3) Something else we cannot deduce from your supplied information.
The fact that this hasn't been reported made me look closer at the
memory consumption and it looks like siesta needs much more memory when
running on two nodes.
I attached two images of the memory consumption, both created via
sampling (same input). The first run on a single node only needs 2.5 GB
memory while the run on two nodes allocates more and more memory (> 120
GB) until the job gets canceled by the system.
Could this be a memory leak? Do you have an idea how to further
investigate this?
Thanks
Matthias
Lastly, I think you should add the MKL lapack and blas libraries to the
MKL_LIBS line (-lmkl_lapack95_lp64 -lmkl_blas95_lp64)
2015-10-15 8:41 GMT+02:00 Matthias Neuer <[email protected]
<mailto:[email protected]>>:
Hi.
I compiled siesta with MPI support and it passed all the tests, in
particular with sih.fdf as input running on two nodes.
But now with a larger input file the calculation crashes in the iscf
part on step 35 when running on two nodes. On a single node the
calculation succeeds.
Here are the last lines from the output file:
siesta: 33 -38796.3045 -38469.9677 -38470.1779 0.9433 -4.6989
siesta: 34 -38688.6177 -38470.2924 -38470.4769 0.6623 -4.7340
siesta: 35 -38681.8110 -38470.5367 -38470.6830 0.5750 -4.7752
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
And here are the corresponding lines from a successful run on a
single node:
siesta: 33 -38796.3054 -38469.9677 -38470.1779 0.9433 -4.6989
siesta: 34 -38688.6172 -38470.2924 -38470.4769 0.6623 -4.7340
siesta: 35 -38681.8106 -38470.5368 -38470.6830 0.5750 -4.7752
So not much of a difference here.
I tested this behavior with version 3.1, 3.2p5 and a development
snapshot and the crash always happens on step 35.
I compiled siesta with
Intel Compiler 13.1
Intel MPI 4.1.1
Intel MKL 11.0.5
but I also tried different versions, unfortunately with the same
outcome.
I attach my arch.make, maybe there is the error.
Thank you for your help
Matthias
--
Matthias Neuer
Universität Ulm
kiz / Abteilung Infrastruktur
--
Matthias Neuer
Universität Ulm
kiz / Abteilung Infrastruktur
--
Kind regards Nick
--
Matthias Neuer
Universität Ulm
kiz / Abteilung Infrastruktur