[slurm-users] Recommended amount of memory for the database server

2022-09-25 Thread byron
Hi Does anyone know what is the recommended amount of memory to give slurms mariadb database server? I seem to remember reading a simple estimate based on the size of certain tables (or something along those lines) but I can't find it now. Thanks

[slurm-users] what's the simplest way to set the site_factor for all jobs?

2022-09-13 Thread byron
I want to allow users to lower the priority of their jobs to allow other peoples jobs to go first and am thinking the easiest way would be for them to use the sbatch nice option. However all of ours jobs currently run with a priorty of 1 as all of the priority weights are set to zero meaning

Re: [slurm-users] unable to ssh onto compute nodes on which I have running jobs

2022-08-03 Thread byron
Thanks for everyones help. All I needed to do was compile a new version of pam_slurm.so. I'm aware there's a newer slurm_pam_adopt but everything was already setup for pam_slurm.so so I just went with that. Regards Lloyd On Wed, Jul 27, 2022 at 9:45 PM Bernd Melchers wrote: > >This

Re: [slurm-users] slurmctld hanging

2022-07-29 Thread byron
that. And it was, until I did the upgrade. On Fri, Jul 29, 2022 at 7:00 AM Loris Bennett wrote: > Hi Byron, > > byron writes: > > > Hi Loris - about a second > > What is the use-case for that? Are these individual jobs or it a job > array. Either way it sounds to me li

Re: [slurm-users] slurmctld hanging

2022-07-28 Thread byron
Hi Loris - about a second On Thu, Jul 28, 2022 at 2:47 PM Loris Bennett wrote: > Hi Byron, > > byron writes: > > > Hi > > > > We recently upgraded slurm from 19.05.7 to 20.11.9 and now we > occasionally (3 times in 2 months) have slurmctld hanging so we get the

[slurm-users] slurmctld hanging

2022-07-28 Thread byron
Hi We recently upgraded slurm from 19.05.7 to 20.11.9 and now we occasionally (3 times in 2 months) have slurmctld hanging so we get the following message when running sinfo “slurm_load_jobs error: Socket timed out on send/recv operation” It only seems to happen when one of our users runs a job

Re: [slurm-users] unable to ssh onto compute nodes on which I have running jobs

2022-07-27 Thread byron
rian Andrus wrote: > >> Verify that their uid on the node is the same as the uid your master sees >> >> Brian Andrus >> >> >> On 7/27/2022 8:53 AM, byron wrote: >> > Hi >> > >> > When a user tries to login into a compute node on

[slurm-users] unable to ssh onto compute nodes on which I have running jobs

2022-07-27 Thread byron
Hi When a user tries to login into a compute node on which they have a running job they get the error Access denied: user blahblah (uid=) has no active jobs on this node. Authentication failed. I recently upgraded slurm to 20.11.9 and was under the impression that prior to the upgrade they

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread byron
y 30, 2022 at 8:18 AM Ole Holm Nielsen wrote: > Hi Byron, > > Adding to Stephan's note, it's strongly recommended to make a database > dry-run upgrade test before upgrading the production slurmdbd. Many > details are in > https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upg

[slurm-users] Rolling upgrade of compute nodes

2022-05-29 Thread byron
Hi I'm currently doing an upgrade from 19.05 to 20.11. All of our compute nodes have the same install of slurm NFS mounted. The system has been setup so that all the start scripts and configuration files point to the default installation which is a soft link to the most recent installation of

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
gt; As for old versions of slurm I think at this point you would need to > contact SchedMD. I'm sure they have past releases they can hand out if you > are bootstrapping to a newer release. > > -Paul Edmon- > On 5/17/22 11:42 AM, byron wrote: > > Thanks Brian for the speedy resp

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
they started at 18.x) with no > issues. Running jobs were not impacted and users didn't even notice. > > Brian Andrus > > > On 5/17/2022 7:35 AM, byron wrote: > > Hi > > > > I'm looking at upgrading our install of slurm from 19.05 to 20.11 in > > responce to th

[slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
Hi I'm looking at upgrading our install of slurm from 19.05 to 20.11 in responce to the recenty announced security vulnerabilities. I've been through the documentation / forums and have managed to find the answers to most of my questions but am still unclear about the following - In upgrading

[slurm-users] incorrectly added account and now get "AssocGrpCPUMinutesLimit" when trying to run job

2021-11-29 Thread byron
I’m trying to replicate the setup of a new account where there is a new “grouping” of accounts and a new account that will actually be used, so something like this when you run sacctmgr show assoc tree mycluster account1. (which is just being used to group accounts and so has no

[slurm-users] Strigger, why not always use "--flags=perm" rather than running the command again each time?

2021-10-11 Thread byron
Hi I've been looking at using strigger for some simple cases such as when a node drains or goes down. Most of the examples I've seen use the format whereby it calls a script which reruns the strigger command for the next event. However there is also the "--flags=perm" approach, is there any

Re: [slurm-users] is there a way to temporarily freeze an account?

2021-10-08 Thread byron
Thanks for all the feedback, am going with Juergens MaxSubmitJobs approach. On Thu, Oct 7, 2021 at 2:55 AM Chris Samuel wrote: > On 6/10/21 6:21 am, byron wrote: > > > We have some accounts that we would like to suspend / freeze for the > > time being that have unused hours as

[slurm-users] is there a way to temporarily freeze an account?

2021-10-06 Thread byron
We have some accounts that we would like to suspend / freeze for the time being that have unused hours associated with them. Is there anyway of doing this without removing the users associated with the accounts or zeroing their hours? We are using slurm version 19.05.7 Thanks

Re: [slurm-users] job stuck as pending - reason "PartitionConfig"

2021-09-30 Thread byron
as that. Thanks for your help. On Wed, Sep 29, 2021 at 7:49 PM Paul Brunk wrote: > Hello Byron: > > > > I’m guessing that your job is asking for more HW than the highmem_p > > has in it, or more cores or RAM within a node than any of the nodes > > have, or something like that.

[slurm-users] job stuck as pending - reason "PartitionConfig"

2021-09-29 Thread byron
is the job that is stuck in state pending JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10860160 highmem MooseBen byron PD 0:00 16 (PartitionConfig) $ sinfo -p highmem PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highmem up infinite

Re: [slurm-users] using sacctmgr to change the parent of an account

2021-09-09 Thread byron
o say yes, else it doesn't make the change. > > Brian Andrus > > On 9/8/2021 3:41 AM, byron wrote: > > Hi > > > > I've added a new account using sbank and have now discovered it should > > have been added with the parent set. We've already accumulated a > > co

[slurm-users] using sacctmgr to change the parent of an account

2021-09-08 Thread byron
Hi I've added a new account using sbank and have now discovered it should have been added with the parent set. We've already accumulated a couple of months of user data so I dont just want to delete it and recreate it in the correct location. I've had a read of the sacctmgr command and think I