Re: [slurm-users] Fwd: Using PreemptExemptTime

2022-02-03 Thread Phil Kauffman
> I know you want to suspend preempted jobs, but what happens if you > cancel them instead? Thanks John. Your response definitely helped me. I have done as you suggested and tested CANCEL which worked. For John and everyone else: below are the results of my tests. My apologies for the wall

Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

2022-02-03 Thread Ryan Novosielski
> On Feb 3, 2022, at 2:55 PM, Ole Holm Nielsen > wrote: > > On 03-02-2022 16:37, Nathan Smith wrote: >> Yes, we are running slurmdbd. We could arrange enough downtime to do an >> incremental upgrade of major versions as Brian Andrus suggested, at least on >> the slurmctld and slurmdbd

Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

2022-02-03 Thread Ole Holm Nielsen
On 03-02-2022 16:37, Nathan Smith wrote: Yes, we are running slurmdbd. We could arrange enough downtime to do an incremental upgrade of major versions as Brian Andrus suggested, at least on the slurmctld and slurmdbd systems. The slurmds I would just do a direct upgrade once the scheduler work

Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

2022-02-03 Thread Nathan Smith
Yes, we are running slurmdbd. We could arrange enough downtime to do an incremental upgrade of major versions as Brian Andrus suggested, at least on the slurmctld and slurmdbd systems. The slurmds I would just do a direct upgrade once the scheduler work was completed. -- Nathan Smith Research

Re: [slurm-users] job_container/tmpfs mounts a private /tmp but the permission is root 700.Normal user can not read or write.

2022-02-03 Thread Brian Andrus
I am guessing that you are running VMs in the cloud, maybe even Azure. By default, they set /tmp to 0700 when the node is deployed. It is up to you to change that as needed as part of the cloud-init or other finalize step. I took to creating /tmp/scratch with permissions of 1777 and then

Re: [slurm-users] How to tell SLURM to ignore specific GPUs

2022-02-03 Thread Paul Raines
On Thu, 3 Feb 2022 1:30am, Stephan Roth wrote: On 02.02.22 18:32, Michael Di Domenico wrote: On Mon, Jan 31, 2022 at 3:57 PM Stephan Roth wrote: The problem is to identify the cards physically from the information we have, like what's reported with nvidia-smi or available in

Re: [slurm-users] Fwd: Using PreemptExemptTime

2022-02-03 Thread John DeSantis
Phil, Does anyone have a working example using PreemptExemptTime? My goal is to make a higher priority job wait 24 hours before actually preempting a lower priority job. Another way, any job is entitled to 24 hours run time before being preempted. The preempted job should be suspended,

Re: [slurm-users] Fairshare within a single Account (Project)

2022-02-03 Thread Tomislav Maric
Thanks a lot, this clarifies it! Dr.-Ing. Tomislav Maric Mathematical Modeling and Analysis TU Darmstadt Tel: +49 6151 16-21469 Alarich-Weiss-Straße 10 64287 Darmstadt Office: L2|06 410 On 2/1/22 20:47, Renfro, Michael wrote: At least from our experience, the default user share within an