I'm having a hard time figuring out the distribution of jobs between 2
clusters in a Slurm multi-cluster environment. The documentation says that
each job is submitted to the cluster that provides the earliest start time,
and once the task is submitted to a cluster, it can't be re-distributed to
I have no experience with this, but based on my understanding of the doc, the
shutdown command should be something like "ssh ${node} systemctl shutdown", and
the resume "ipmitool -I lan -H ${node}-bmc -U -f password_file.txt
chassis power on ".
If you use libvirt for your virtual cluster, you
On 7/28/22 18:49, Djamil Lakhdar-Hamina wrote:
I am helping set up a 16 node cluster computing system, I am not a
system-admin but I work for a small firm and unfortunately have to pick
up needed skills fast in things I have little experience in. I am
running Rocky Linux 8 on Intel Xeon
I am helping set up a 16 node cluster computing system, I am not a
system-admin but I work for a small firm and unfortunately have to pick up
needed skills fast in things I have little experience in. I am running
Rocky Linux 8 on Intel Xeon Knights Landings nodes donated by the TAAC
center. We are
Hello Slurm Users,
I am experimenting with the new --prefer soft constraint option in 22.05.
The option behaves as described, but is somewhat inefficient if many jobs
with different --prefer options are submitted. Here is the scenario:
1. submit array of 100 tasks preferring feature A, each
Dear all,
I have copied the user file from windows and did not covert it using
dos2unix and using a shell script to add the user and account to the slurm
but I am facing the problem and the output of the sshare command as below-
[root@master01]# sshare -a
Account User
On Friday, 30 July 2021 11:21:19 AM PDT Soichi Hayashi wrote:
> I am running slurm-wlm 17.11.2
You are on a truly ancient version of Slurm there I'm afraid (there have been
4 major releases & over 13,000 commits since that was tagged in January 2018),
I would strongly recommend you try and get
Hello. I need a help with troubleshooting our slurm cluster.
I am running slurm-wlm 17.11.2 on Ubuntu 20 on a public cloud
infrastructure (Jetstream) using an elastic computing mechanism (
https://slurm.schedmd.com/elastic_computing.html). Our cluster works for
the most part, but for some reason,
Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM
On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think most
of your requirements and
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think
most of your requirements and questions are described in these pages.
My Wiki gives detailed deployment information for a CentOS 7 cluster,
but
Hi,
I'm adding a bunch of memory on two of our nodes that are part of a blade
chassis. So two computes will be upgraded to 1TB RAM and the rest have
192GB. All of the nodes belog to several partitons and can be used by our
paid members given the partition below. I'm looking for ways to figure out
Hey, folks. I have a relatively simple queueing setup on Slurm 17.02 with a
1000 CPU-day AssocGrpCPURunMinutesLimit set. When the cluster is less busy
than typical, I may still have users run up against the 1000 CPU-day limit,
even though some nodes are idle.
What’s the easiest way to force a job
12 matches
Mail list logo