[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Jason Simms via slurm-users
On the one hand, you say you want "to *allocate a whole node* for a single multi-threaded process," but on the other you say you want to allow it to "*share nodes* with other running jobs." Those seem like mutually exclusive requirements. Jason On Thu, Aug 1, 2024 at 1:32 PM Henrique Almeida via

[slurm-users] Partition Preemption Configuration Question

2024-05-02 Thread Jason Simms via slurm-users
Hello all, The Slurm docs have me a bit confused... I'm wanting to enable job preemption on certain partitions but not others. I *presume* I would set PreemptType=preempt/partition_prio globally, but then on the partitions where I don't want jobs to be able to be preempted, I would set

[slurm-users] Re: Trying to Track Down root Usage

2024-04-29 Thread Jason Simms via slurm-users
user root in place? > > sreport accounts resources reserved for a user as well (even if not > used by jobs) while sacct reports job accounting only. > > Best regards > Jürgen > > > * Jason Simms via slurm-users [240429 > 10:47]: > > Hello all, > > > > E

[slurm-users] Trying to Track Down root Usage

2024-04-29 Thread Jason Simms via slurm-users
Hello all, Each week, I generate an automated report of the top users by CPU hours. This week, for whatever reason the user root accounted for a massive number of hours: Login Proper Name Used

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Jason Simms via slurm-users
As a related point, for this reason I mount /var/log separately from /. Ask me how I learned that lesson... Jason On Tue, Apr 16, 2024 at 8:43 AM Jeffrey T Frey via slurm-users < slurm-users@lists.schedmd.com> wrote: > AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" > is

[slurm-users] Re: Enforcing relative resource restrictions in submission script

2024-02-28 Thread Jason Simms via slurm-users
Hello Matthew, You may be aware of this already, but most sites would make these kinds of checks/validations using job_submit.lua. I'm not an expert in that - though plenty of others on this list are - but I'm positive you could implement this type of validation logic. I'd like to say that I've

[slurm-users] Re: pty jobs are killed when another job on the same node terminates

2024-02-28 Thread Jason Simms via slurm-users
Hello Thomas, I know I'm a few days late to this, so I'm wondering whether you've made any progress. We experience this, too, but in a different way. First, though, you may be aware, but you should use salloc rather than srun --pty for an interactive session. That's been the preferred method for

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-25 Thread Jason Simms via slurm-users
Hello Daniel, In my experience, if you have a high-speed interconnect such as IB, you would do IPoIB. You would likely still have a "regular" Ethernet connection for management purposes, and yes that means both an IB switch and an Ethernet switch, but that switch doesn't have to be anything

[slurm-users] Recover Batch Script Error

2024-02-16 Thread Jason Simms via slurm-users
Hello all, I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual