[slurm-users] AWS SLURM Burst Cluster, fill configuring nodes

2018-10-25 Thread J.R. W
Hello everyone, I setup a SLURM cluster based on this post and plugin. https://aws.amazon.com/blogs/compute/deploying-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-1/

Re: [slurm-users] Looking for old SLURM versions

2018-10-25 Thread Fulcomer, Samuel
We've got 15.0.8/9. -s On Wed, Oct 24, 2018 at 5:51 PM, Bob Healey wrote: > I'm in the process of upgrading a system that has been running 2.5.4 for > the last 5 years with no issues. I'd like to bring that up to something > current, but I need a a bunch of older versions that do not appear to

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Andy Georges
Hi, > On 22 Aug 2018, at 16:27, Christian Peter > wrote: > > hi, > > we observed a strange behavior of pam_slurm_adopt regarding the involved > cgroups: > > when we start a shell as a new Slurm job using "srun", the process has > freezer, cpuset and memory cgroups setup as e.g. > "/slurm/u

Re: [slurm-users] Can't find an address

2018-10-25 Thread Eli V
In addition to these other suggestions, keep in mind the slurmd's will talk to each other if you have more then 50 nodes(see TreeWidth in slurm.conf), so this will require the nodes to be able to DNS lookup and communicate to all the other nodes as well as the slurmctlds. I tried adding in some nod

Re: [slurm-users] Can't find an address

2018-10-25 Thread Andy Riebs
Make sure that the "hostname" command returns the same name that Slurm expects on your compute nodes. *From:* Zohar Roe Mlm *Sent:* Thursday, October 25, 2018 3:02AM *To:* 'Slurm User Community List' *Cc:* *Subject:* Re:

[slurm-users] partition problem with 2 different users

2018-10-25 Thread Joerg Sassmannshausen
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT --- Begin Message --- <>--- End Message ---

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Chris Samuel
On Thursday, 25 October 2018 6:13:52 PM AEDT Ole Holm Nielsen wrote: > Nice command, Chris! I added a couple of usernames from CentOS 7 as > seen below. We're on CentOS7 too (for compute nodes), I guess we're a bit more minimal. > However, defunct processes seem to escape cgroups, for example:

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Ole Holm Nielsen
On 10/25/2018 07:00 AM, Christopher Samuel wrote: On 25/10/18 2:29 pm, Christopher Samuel wrote: Could explain why this isn't something we see consistently, and why we're both seeing it currently. This seems to be a handy way to find any processes that are not properly constrained by Slurm c

Re: [slurm-users] Can't find an address

2018-10-25 Thread Zohar Roe MLM
Hi Lachlan, Thanks for the replay. I am trying to find more Ideas for this problem. May be some system or strange communication problem. As for your suggestion: > Check that it's in /etc/hosts --> It is. And answer to ping both on ip and > host name every time I check > Check the slurmd logs -->