An additional comment, in the post at:
https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032729.html
You can see that they have the error of the form:
error: commlib error: got select error (Connection reset by peer)
error: executing task of job x failed: failed sending task to
execd@hostname: can't find connection
It looks like they might have tracked down the problem to the master
daemon (qmaster), as seen in the post at:
https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032758.html
So, maybe, the error could be caused by a daemon problem (with the
tachyon1478 node).
On 7/10/2015 5:01 AM, Laurence Marks wrote:
From a brief Google search this is an mpi error.
How did you compile, it is easy to use wrong blacs combinations.
Have you run simpler cases such as TiC first?
---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
On Jul 10, 2015 03:05, "Imran Khan" <imrankhanswat...@gmail.com
<mailto:imrankhanswat...@gmail.com>> wrote:
Dear wien2k experts and users,
I am using wien2k version 14.2 on a queuing system (SGE), with
intel compiler 11.1, MPI libraries mpi/openmpi-1.6.3 and math
libraries fftw-3.3.4. With these options I install Wien2K without
any compile time error.
The purpose of my calculation is to find the stable site for
different substituents in NdFeB intermetallics.
I am running the case.struct given in the attachment, using 200 (6
6 4) k-points. My RKmax value is 7 and Gmax is 12, and I am using
LDA+U method.
I am using the following command runsp_lapw -p -orb -i 80 -ec
0.0001 -cc 0.001
Every time I submit my job after few scf cycles the job is
terminated with the following error in the error tag file.
error: commlib error: got select error (Connection reset by peer)
error: executing task of job 2424636 failed: failed sending task
to execd@tachyon1478: can't find connection
.
.
.
LAPW2 END
LAPW2 END
LAPW2 END
LAPW2 END
real 0m53.638s
forrtl: No such file or directory
forrtl: severe (29): file not found, unit 21, file
/home01/x1030imr/khan/Wien2K/Neomagnet/Pr-doped/f-site/AFM/Pr-Af/Pr-Af.scf2up_31
Image PC Routine Line Source
sumpara 00000000004A671D Unknown Unknown Unknown
sumpara 00000000004A5225 Unknown Unknown Unknown
sumpara 0000000000456259 Unknown Unknown Unknown
sumpara 0000000000416A5A Unknown Unknown Unknown
sumpara 0000000000416250 Unknown Unknown Unknown
sumpara 0000000000421E3D Unknown Unknown Unknown
sumpara 0000000000410771 scfsum_ 126 scfsum.f
sumpara 000000000040EE82 MAIN__ 219 sumpara.f
sumpara 00000000004033DC Unknown Unknown Unknown
libc.so.6 00000035AA81D974 Unknown Unknown Unknown
sumpara 00000000004032E9 Unknown Unknown Unknown
cp: cannot stat `.in.tmp': No such file or directory
I have discussed this error with the engineers of that queuing
system (tachyon), and I have searched the mailing list as well but
could not find any solutions.
your guidance to solve this issue will be greatly appreciated.
Best regards
Imran.
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html