[slurm-users] Re: Upgrade node while jobs running

2024-08-02 Thread Sid Young via slurm-users
if it goes wrong?  > > > > Regards, > > > > Tim > > -- > > *Tim Cutts* > > Scientific Computing Platform Lead > > AstraZeneca > > > > Find out more about R IT Data, Analytics & AI and how we can support you > by visiting our Service Cata

[slurm-users] Upgrade node while jobs running

2024-07-31 Thread Sid Young via slurm-users
G'day all, I've been waiting for node to become idle before upgrading them however some jobs take a long time. If I try to remove all the packages I assume that kills the slurmstep program and with it the job. Sid -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send

[slurm-users] Re: Slurm management of dual-node server trays?

2024-02-23 Thread Sid Young via slurm-users
Thats a Very interesting design and looking at the SD665 V3 documentation am I correct each node has dual 25GBs SFP28 interfaces? If so, the despite dual nodes in a 1u configuration, you actually have 2 separate servers? Sid On Fri, 23 Feb 2024, 22:40 Ole Holm Nielsen via slurm-users, <

[slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?

2022-11-10 Thread Sid Young
Is there a direct upgrade path from 20.11.0 to 22.05.6 or is it in multiple steps? Sid Young On Fri, Nov 11, 2022 at 7:53 AM Marshall Garey wrote: > We are pleased to announce the availability of Slurm version 22.05.6. > > This includes a fix to core selection for steps which cou

Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young
Brian / Christopher, that looks like a good process, thanks guys, I will do some testing and let you know. if I mark a partition down and it has running jobs, what happens to those jobs, do they keep running? Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W

Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young
Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel wrote: > On 1/31/22 4:41 pm, Sid Young wrote: > > > I need to replace a faulty DIMM chim in our log

[slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young
minutes, scheduler is a separate node and I could email back any users who try to SSH while the node is down. Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/

Re: [slurm-users] Submitting jobs via SystemD

2021-09-15 Thread Sid Young
Whats wrong with just using the tools as is? Sid Young On Thu, Sep 16, 2021 at 5:54 AM Ondrej Valousek wrote: > Hi list, > I am wondering if there is a plugin allowing to submit jobs via SystemD > (I.e. using systemd-run) on exec nodes. > > I have actually modified SGE s

Re: [slurm-users] [External] Node utilization for 24 hours

2021-09-07 Thread Sid Young
00%|100.00% #trihpc|energy|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|billing|14.62%|4.78%|0.00%|80.60%|0.00%|100.00% #trihpc|fs/disk|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|vmem|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|pages|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% Sid Young W: https://off-grid-engin

Re: [slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread Sid Young
Why not spin them up as Virtual machines... then you could build real (separate) clusters. Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Wed, Jul 28, 2021 at 12:07 AM Brian Andrus wrote: > You can

Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-07-01 Thread Sid Young
Hi Luis, I have exactly the same issue with a user who needs the reported cores to reflect the requested cores. If you find a solution that works please share. :) Thanks Sid Young Translational Research Institute Sid Young W: https://off-grid-engineering.com W: (personal) https

Re: [slurm-users] [External] incorrect number of cpu's being reported in srun job

2021-06-22 Thread Sid Young
Thanks for the reply... I will look into how to configure it. Sid Young Translational Research Institute On Wed, Jun 23, 2021 at 7:06 AM Prentice Bisbal wrote: > Yes, > > You need to use the cgroups plugin. > > > On Fri, Jun 18, 2021, 12:29 AM Sid Young wrote: > >>

[slurm-users] incorrect number of cpu's being reported in srun job

2021-06-17 Thread Sid Young
SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory ReturnToService=1 CpuFreqGovernors=OnDemand,Performance,UserSpace CpuFreqDef=Performance Sid Young Translational Research Institute

[slurm-users] Slurm stats in JSON format

2021-06-07 Thread Sid Young
G'Day all, Is there a tool that will extract the job counts in JSON format? Such as #running, #in pending #onhold etc I am trying to build some custom dashboards for the our new cluster and this would be a really useful set of metrics to gather and display. Sid Young W: https://off-grid

[slurm-users] slurmrestd

2021-06-06 Thread Sid Young
Hi all, I'm interested in using the slurmrestd but it does not appear to be built when you do an rpmbuild reading though the docs does not indicate a switch needed to include it (unless I missed that)... any ideas on how the rpm is built? Sid Young W: https://off-grid-engineering.com W

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-13 Thread Sid Young
Yes, on reflection I should have said utilization rather than usage! I've been researching what the most likely combination of metrics would give me an overall utilization of the HPC. Sadly its not as clear cut as I would have hoped. Does anyone have any ideas? Sid Young On Fri, May 14, 2021

[slurm-users] Determining Cluster Usage Rate

2021-05-13 Thread Sid Young
Hi All, Is there a way to define an effective "usage rate" of a HPC Cluster using the data captured in the slurm database. Primarily I want to see if it can be helpful in presenting to the business a case for buying more hardware for the HPC :) Sid Young

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-05-04 Thread Sid Young
You can push a new conf file and issue an "scontrol reconfigure" on the fly as needed... I do it on our cluster as needed, do the nodes first then login nodes then the slurm controller... you are making a huge issue of a very basic task... Sid On Tue, 4 May 2021, 22:28 Tina Friedrich, wrote:

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread Sid Young
Hi David, I use SaltStack to push out the slurm.conf file to all nodes and do a "scontrol reconfigure" of the slurmd, this makes management much easier across the cluster. You can also do service restarts from one point etc. Avoid NFS mounts for the config, if the mount locks up your screwed.