Hi,

WIEN2k has a usersguide, where the different parallelization modes are extensively described.

On a cluster with a queuing system (like SLURM) it should not even be possible to access nodes (except the frontend) via ssh without using SLURM (on our SLURM machine ssh is possible only to nodes which are assigned to a user by salloc or a sbatch job), thus overloading can be prevented.

We ALWAYS run our jobs using SLURM and typically submit them using

sbatch slurm.job

Now you mentioned correctly, that wien2k needs a ".machine" file and thus "slurm.job" has to create it on the fly.

I've provided an example script (which you may need to adapt for user or resource specifications) at

http://www.wien2k.at/reg_user/faq/pbs.html

One more hint: in the file $WIENROOT/parallel_options one specifies if k-point parallel (USE_REMOTE) and mpi-parallel (MPI_REMOTE) jobs are started using ssh (1) or not (0).

setenv USE_REMOTE 1 or 0
setenv MPI_REMOTE 0

On modern mpi-versions use always MPI_REMOTE=0.
Usually k-point parallelism is meaningful only for up to 8 (then set OMP_NUM_THREAD=2) or 16 cores, otherwise the overhead is too big. In such cases one would use only ONE node and it runs as a "shared-memory machine (USE_REMOTE=0) (without mpi at all). On medium sized cases with a few k-points and larger matrices, a "mixed" k-parallel and mpi-parallel setup is best and is used in the slurm.job example above.


PS: I'm sending this also to our WIEN2k-mailing list, because this is of general interest and I don't want to write the same email all the time again. PPS: Please use in general the mailing list (see www.wien2k.at), as I normally do not answer questions directly sent to me.

Best regards
Peter Blaha

On 11/11/2015 07:08 PM, Robb III, George B. wrote:
Hi Dr. Schwartz / Dr. Blaha-

We have noticed on our SLURM <http://schedmd.com/#index> based research
cluster that WIEN2k suite commands take advantage of a .machines file to
spawn off ssh sessions to individual nodes vs using a scheduler.

We have SLURM configured to control cluster resource allocations and
have collisions of resources when ssh processes are called from the
WIEN2k suite.

e.g. SLURM controls node2 and has 95% allocated resources for jobs, but
WIEN2k process is launched from the head node it will ssh to node2 (due
to the 5% free resources) and spawn additional un-SLURM-managed
processes on node2.  Node2 is now over subscribed.

Does the WIEN2k suite have an administrative guide or documentation?
Does the WIEN2k suite have recommend cluster manager (i.e. PBS, SLURM,
LSF, etc..)?

Thanks again for any assistance and looking forward having the labs on
our campus use the WIEN2k suite.

Thanks,

George B. Robb III
Systems Administrator
Research Computing Support Services - (RCSS)
University of Missouri System

--

                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to