Of note, if the XSEDE in your subject line uses slurm as in the documentation here:


You likely need the SLURM_JOB_NODELIST variable as given in your slurm documentation.  For example, the slurm documentation (currently Version 19.05) here:


On 11/4/2019 8:24 PM, Gavin Abo wrote:


Edison does look retired [1].

Based on the usage of hostname in Bushra's job file (below), it looks like that is configured for a shared memory super computer.

However, if the super computer is not a shared memory (single node) system but a distributed memory (multiple node) system [2], the use of hostname is potentially problematic.

That is because on a distributed memory system the head node typical is not a compute node [3].

One bad thing that can happen is that head node calculations can break the cluster login, for example [4]:

/Do NOT use the login nodes for work. If everyone does this, the login nodes will crash keeping 700+ HPC users from being able to login to the cluster.//

It depends on local policy, but most clusters I have seen have a policy that the system administrators can permanently take away a user's access to the cluster if a calculation is executed on the head node, for example [5]:

/CHTC staff reserve the right to kill any long-running or problematic processes on the head nodes and/or disable user accounts that violate this policy, and users may not be notified of account deactivation./

Instead of hostname, the job file usually needs to get a node list that it gets from the queuing system's job scheduler.  That could be a script like gen.machines [6] or Machines2W [7].  Or it could be environment variable, which name depends on the queuing system, for example the PBS_NODEFILE variable for PBS [8,9].

[1] https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2019/edison-supercomputer-to-retire-after-five-years-of-service/ [2] https://www.researchgate.net/figure/Shared-vs-Distributed-memory_fig3_323108484
[3] https://zhanglab.ccmb.med.umich.edu/docs/node9.html
[4] https://hpc.oit.uci.edu/running-jobs
[5] http://chtc.cs.wisc.edu/HPCuseguide.shtml
[6] https://docs.nersc.gov/applications/wien2k/
[7] SRC_mpiutil: http://susi.theochem.tuwien.ac.at/reg_user/unsupported/
[8] Script for "pbs": http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html [9] http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm

On 11/4/2019 6:37 AM, Dr. K. C. Bhamu wrote:
Dear Bushra,

I hope you are using the same cluster you are using before (NERSC: cori/edison). From your job file it seems that you want to submit job on edison (28 cores). Please make sure that edison is still working. My available information says that edison has retired now. Please confirm from the system admin. I would suggest you to submit job on cori. A job file is there on web-page of NERSC.

Anyway, please send the details as Prof. Peter has requested so that he can help you.


On Mon, Nov 4, 2019 at 1:14 PM Peter Blaha <pbl...@theochem.tuwien.ac.at <mailto:pbl...@theochem.tuwien.ac.at>> wrote:

    What means:  " does not work" ??

    We need details.

    On 11/3/19 10:48 PM, BUSHRA SABIR wrote:
    > Hi experts,
    > I am working on super computer with WIEN2K/19.1 and using the
    > job file, but this job file is not working for parallel run of
    > Need help to improve this job file.
    > #!/bin/bash
    > #SBATCH -N 1
    > #SBATCH -p RM
    > #SBATCH --ntasks-per-node 28
    > #SBATCH -t 2:0:00
    > # echo commands to stdout
    > # set -x
    > module load mpi
    > module load intel
    > export SCRATCH="./"
    > #rm .machines
    > #write .machines file
    > echo '#' .machines
    > # example for an MPI parallel lapw0
    > #echo 'lapw0:'`hostname`'  :'$nproc >> .machines
    > # k-point and mpi parallel lapw1/2
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo '1:'`hostname`':1' >> .machines
    > echo 'granularity:1' >>.machines
    > echo 'extrafine:1' >>.machines
    > export SCRATCH=./
    > runsp_lapw -p -ec 0.000001 -cc 0.0001 -i 40 -fc 1.0
    >   Bushra
    > _______________________________________________
    > Wien mailing list
    > Wien@zeus.theochem.tuwien.ac.at
    > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

    Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
    Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
    Email: bl...@theochem.tuwien.ac.at
    <mailto:bl...@theochem.tuwien.ac.at>   WIEN2k: http://www.wien2k.at
    WWW: http://www.imc.tuwien.ac.at/TC_Blaha

Wien mailing list

Reply via email to