Re: [slurm-users] 21.08: Removing batch scripts from the database

2021-10-05 Thread Andy Georges
Hi, On 05/10/2021 08:45, Kevin Buckley wrote: Trying to get my head around the extremely useful addition, for 21.08 onwards, as regards storing the batch scripts in the accounting database, You are aware of two existing solutions to this that do not involve the slurm accounting DB?

Re: [slurm-users] update node config while jobs are running

2020-03-10 Thread Andy Georges
Hi, On Tue, Mar 10, 2020 at 05:49:07AM +, Rundall, Jacob D wrote: > I need to update the configuration for the nodes in a cluster and I’d like to > let jobs keep running while I do so. Specifically I need to add > RealMemory= to the node definitions (NodeName=). Is it safe to do this > for

Re: [slurm-users] RHEL8 support

2019-10-30 Thread Andy Georges
Hi Brian, On Mon, Oct 28, 2019 at 10:42:59AM -0700, Brian Andrus wrote: > Ok, I had been planning on getting around to it, so this prompted me to do > so. > > Yes, I can get slurm 19.05.3 to build (and package) under CentOS 8. > > There are some caveats, however since many repositories and

Re: [slurm-users] Running mix versions of slurm while upgrading

2019-10-21 Thread Andy Georges
Hi Tony, On Mon, Oct 21, 2019 at 01:52:21AM +, Tony Racho wrote: > Hi: > > We are planning to upgrade our slurm cluster however we plan on NOT doing it > in a one-go. > > We are on 18.08.7 at the moment (db, controller, clients) > > We'd like to do it in a phased approach. > > Stop

Re: [slurm-users] SLURM in Virtual Machine

2019-09-13 Thread Andy Georges
Hi Jose, On Thu, Sep 12, 2019 at 04:23:11PM +0200, Jose A wrote: > Dear all, > > In the expansion of our Cluster we are considering to install SLURM within a > virtual machine in order to simplify updates and reconfigurations. > > Does any of your have experience running SLURM in VMs? I would

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Andy Georges
Hi Mark, Chris, On Mon, Jul 15, 2019 at 01:23:20PM -0400, Mark Hahn wrote: > > Could it be a RHEL7 specific issue? > > no - centos7 systems here, and pam_adopt works. Can you show what your /etc/pam.d/sshd looks like? Kind regards, -- Andy signature.asc Description: PGP signature

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Andy Georges
Hi Juergen, On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote: > Dear all, > > I have configured pam_slurm_adopt in our Slurm test environment by > following the corresponding documentation: > > https://slurm.schedmd.com/pam_slurm_adopt.html > > I've set `PrologFlags=contain´ in

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-10 Thread Andy Georges
Hi, > So here's something funny. One user submitted a job that requested 60 cpu's > and 40M of memory. Our largest nodes in that partition have 72 cpu's and > 256G of memory. So when a user requests 400G of ram, what would be good > behavior? I would like to see slurm reject the job, "job

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Andy Georges
Hi, On Wed, Jul 03, 2019 at 03:49:44PM +, David Baker wrote: > Hello, > > > A few of our users have asked about running longer jobs on our cluster. > Currently our main/default compute partition has a time limit of 2.5 days. > Potentially, a handful of users need jobs to run up to 5 hours.

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Andy Georges
Hi, > On 22 Aug 2018, at 16:27, Christian Peter > wrote: > > hi, > > we observed a strange behavior of pam_slurm_adopt regarding the involved > cgroups: > > when we start a shell as a new Slurm job using "srun", the process has > freezer, cpuset and memory cgroups setup as e.g. >

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-22 Thread Andy Georges
Hi Chris, > On 24 Aug 2018, at 10:57, Christian Peter > wrote: > > hi, > > thank you patrick, thank you kilian for identifying a systemd issue here! > > for a quick test, we disabled and masked systemd-logind. the "memory" cgroup > now works as expected. great! > > we're now watching out

[slurm-users] Job walltime

2018-10-17 Thread Andy Georges
Hello, We are migrating away from a Torque/Moab setup. For user convenience, we’re trying to make the differences minimal. I am wondering is there is a way to set the job walltime in the job environment (to set $PBS_WALLTIME). It’s unclear to me how this information can be retrieved on the

Re: [slurm-users] Create users

2018-10-05 Thread Andy Georges
Hi, > On 3 Oct 2018, at 16:51, Andy Georges wrote: > > Hi all, > >> On 15 Sep 2018, at 14:47, Chris Samuel wrote: >> >> On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote: >> >>> Another way would be to make all your Linux users and

Re: [slurm-users] Create users

2018-10-03 Thread Andy Georges
Hi all, > On 15 Sep 2018, at 14:47, Chris Samuel wrote: > > On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote: > >> Another way would be to make all your Linux users and then map that in to >> Slurm using sacctmgr. > > At ${JOB} and ${JOB-1} we've wired user creation in Slurm

[slurm-users] sacct does not show anything when PrivateData=jobs is set in slurmdbd.conf

2018-05-18 Thread Andy Georges
Hi, As per the guidelines on the slurmdbd.conf and sachet manual pages, I have set PrivateData=jobs (amongst others) in slurmdbd.conf. However, at this point no job information is available anymore when running sacct, it just does not provide any job related output: vsc40075@gligar03

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Andy Georges
> On 30 Apr 2018, at 22:37, Nate Coraor wrote: > > Hi Shawn, > > I'm wondering if you're still seeing this. I've recently enabled task/cgroup > on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping > their cgroups. For me this is resulting in a lot of

[slurm-users] stdout (and stderr) job files

2018-03-15 Thread Andy Georges
Hello, We are transitioning from Moab/Torque to Slurm. I was wondering if there is a way to have Slurm also create the stdout (and stderr) file for the job on the node (be default), rather than on the shared FS. We sometimes have users who write a lot of stuff to stdout from their job

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi all, Cranked up the debug level a bit Job was not started when using: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun --pty bash -i salloc: Granted job allocation 42 salloc: Waiting for resource configuration salloc: Nodes node2801 are ready for job For comparison purposes,

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
> that salloc is necessary. > > The most simple command that I typically use is: > > srun -N1 -n1 --pty bash -i > > Mike > >> On 3/9/18 10:20 AM, Andy Georges wrote: >> Hi, >> >> >> I am trying to get interactive jobs to work from the machine

[slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi, I am trying to get interactive jobs to work from the machine we use as a login node, i.e., where the users of the cluster log into and from where they typically submit jobs. I submit the job as follows: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun bash -i salloc: Granted