An additional comment, in the post at:

https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032729.html

You can see that they have the error of the form:

error: commlib error: got select error (Connection reset by peer)
error: executing task of job x failed: failed sending task to execd@hostname: can't find connection

It looks like they might have tracked down the problem to the master daemon (qmaster), as seen in the post at:

https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032758.html

So, maybe, the error could be caused by a daemon problem (with the tachyon1478 node).

On 7/10/2015 5:01 AM, Laurence Marks wrote:

From a brief Google search this is an mpi error.

How did you compile, it is easy to use wrong blacs combinations.

Have you run simpler cases such as TiC first?

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody else has thought"
Albert Szent-Gyorgi

On Jul 10, 2015 03:05, "Imran Khan" <imrankhanswat...@gmail.com <mailto:imrankhanswat...@gmail.com>> wrote:

    Dear wien2k experts and users,
    I am using wien2k version 14.2 on a queuing system (SGE), with
    intel compiler 11.1, MPI libraries mpi/openmpi-1.6.3 and math
    libraries fftw-3.3.4. With these options I install Wien2K without
    any compile time error.
    The purpose of my calculation is to find the stable site for
    different substituents in NdFeB intermetallics.
    I am running the case.struct given in the attachment, using 200 (6
    6 4) k-points. My RKmax value is 7 and Gmax is 12, and I am using
    LDA+U method.
    I am using the following command  runsp_lapw -p -orb -i 80 -ec
    0.0001 -cc 0.001
    Every time I submit my job after few scf cycles the job is
    terminated with the following error in the error tag file.

    error: commlib error: got select error (Connection reset by peer)
    error: executing task of job 2424636 failed: failed sending task
    to execd@tachyon1478: can't find connection
        .
        .
        .
     LAPW2 END
     LAPW2 END
     LAPW2 END
     LAPW2 END
    real    0m53.638s
    forrtl: No such file or directory
    forrtl: severe (29): file not found, unit 21, file
    
/home01/x1030imr/khan/Wien2K/Neomagnet/Pr-doped/f-site/AFM/Pr-Af/Pr-Af.scf2up_31
    Image              PC                Routine      Line        Source
    sumpara            00000000004A671D  Unknown         Unknown  Unknown
    sumpara            00000000004A5225  Unknown         Unknown  Unknown
    sumpara            0000000000456259  Unknown         Unknown  Unknown
    sumpara            0000000000416A5A  Unknown         Unknown  Unknown
    sumpara            0000000000416250  Unknown         Unknown  Unknown
    sumpara            0000000000421E3D  Unknown         Unknown  Unknown
    sumpara            0000000000410771  scfsum_             126  scfsum.f
    sumpara            000000000040EE82  MAIN__            219  sumpara.f
    sumpara            00000000004033DC  Unknown         Unknown  Unknown
    libc.so.6          00000035AA81D974  Unknown         Unknown  Unknown
    sumpara            00000000004032E9  Unknown         Unknown  Unknown
    cp: cannot stat `.in.tmp': No such file or directory

    I have discussed this error with the engineers of that queuing
    system (tachyon), and I have searched the mailing list as well but
    could not find any solutions.
    your guidance to solve this issue will be greatly appreciated.
    Best regards
    Imran.

_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to