Re: [slurm-users] Running Containerized Slurmctld and Slurmdb in Production?

2023-03-15 Thread Hanby, Mike
-03-15T01:21:21 juju-65df3d-2 Mike From: slurm-users on behalf of Hanby, Mike Date: Wednesday, February 15, 2023 at 1:51 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Running Containerized Slurmctld and Slurmdb in Production? Howdy, Just wondering if any sites are running con

[slurm-users] Running Containerized Slurmctld and Slurmdb in Production?

2023-02-15 Thread Hanby, Mike
Howdy, Just wondering if any sites are running containerized Slurmctld and Slurmdbd in production? We are in the process of planning migrating from a single host running slurmctld, slurmdbd, and MySQL (and other HPC services) to separate OpenStack VMs. Our site averages less than 1000’s

Re: [slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Hanby, Mike
Restart In slurm.conf, we just add the Features to the node description. Is that what you were looking for? NodeName=compute-4-4 … Weight=15 Feature=gen10 Jeff UH IT - HPC From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Hanby, Mike Sent: Thursday, June 2, 2022 2

[slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Hanby, Mike
Howdy, I can’t seem to find a solution in ‘man slurm.conf’ for this. How can I make the following persist a slurmctld restart: scontrol update NodeName="c001" AvailableFeatures=hi_mem,data,scratch NodeName=c001 Arch=x86_64 CoresPerSocket=12 CPUAlloc=2 CPUTot=48 CPULoad=6.08

Re: [slurm-users] Cancel "reboot ASAP" for a node

2020-08-07 Thread Hanby, Mike
eboot" scontrol cancel_reboot c01 From: "Hanby, Mike" Date: Friday, August 7, 2020 at 11:43 AM To: Slurm User Community List Subject: Cancel "reboot ASAP" for a node Howdy, (Slurm 18.08) We have a bunch of node that we've updated to "scontrol reboot ASAP".

[slurm-users] Cancel "reboot ASAP" for a node

2020-08-07 Thread Hanby, Mike
Howdy, (Slurm 18.08) We have a bunch of node that we've updated to "scontrol reboot ASAP". We'd like to cancel a few of those. From the man page, it's suggested that either of the following should work, however both report the same error " slurm_update error: Invalid node state specified":

[slurm-users] Limit Number of Jobs per User in Queue?

2020-03-18 Thread Hanby, Mike
Howdy, We are running Slurm 18.08. We have a user who has, twice, submitted over 15 thousand jobs to the cluster (the queue normally has a couple thousand jobs at any given time). This results in Slurm being unresponsive to user requests / job submits. I suspect the scheduler is getting

[slurm-users] Advanced Reservation 'Tues, Thus' Instead of 'weekday'?

2019-09-04 Thread Hanby, Mike
Howdy, Running Slurm 18.08.8 We have a request to create a 2 node reservation for a class that will meet every Tues and Thus this semester from 8AM to 9:15AM. Is there a way to create a reservation match that, or is the closest we can do is create a weekday reservation for that timeframe,

Re: [slurm-users] Restore Last JOBID After Reinstall of Slurm Master Node?

2018-12-24 Thread Hanby, Mike
slurm.conf with FirstJobId -b On 12/24/2018 1:09 AM, Sean Caron wrote: On Mon, Dec 24, 2018 at 12:13 AM Hanby, Mike mailto:mha...@uab.edu>> wrote: Howdy, We installed a new server to take over the duties of the Slurm master. I imported our accounting database into MySQL, copied config files etc

[slurm-users] Restore Last JOBID After Reinstall of Slurm Master Node?

2018-12-23 Thread Hanby, Mike
Howdy, We installed a new server to take over the duties of the Slurm master. I imported our accounting database into MySQL, copied config files etc.. Apparently I missed the “file” that contains the last (or is it next) JOBID to assign to the next job. The first job submitted to the new