Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Chris Samuel
On 10/3/20 1:40 pm, mike tie wrote: Here is the output of lstopo Hmm, well I believe Slurm should be using hwloc (which provides lstopo) to get its information (at least it calls the xcpuinfo_hwloc_topo_get() function for that), so if lstopo works then slurmd should too. Ah, looking a bit

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Kirill 'kkm' Katsnelson
Yup, I think if you get stuck so badly, the first thing is to make sure the node does not get the number 10 from the controller, and the second just reimage the VM fresh. It maybe not the quickest way, but at least predictable in the sense of time spent. Good luck! -kkm On Wed, Mar 11, 2020 at

Re: [slurm-users] Upgrade paths

2020-03-11 Thread Ole Holm Nielsen
On 11-03-2020 20:01, Will Dennis wrote: I have one cluster running v16.05.4 that I would like to upgrade if possible to 19.05.5; it was installed via a .deb package I created back in 2016. I have located a 17.11.7 Ubuntu PPA (https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have

Re: [slurm-users] Upgrade paths

2020-03-11 Thread Renfro, Michael
The release notes at https://slurm.schedmd.com/archive/slurm-19.05.5/news.html indicate you can upgrade from 17.11 or 18.08 to 19.05. I didn’t find equivalent release notes for 17.11.7, but upgrades over one major release should work. > On Mar 11, 2020, at 2:01 PM, Will Dennis wrote: > >

[slurm-users] Upgrade paths

2020-03-11 Thread Will Dennis
Hi all, I have one cluster running v16.05.4 that I would like to upgrade if possible to 19.05.5; it was installed via a .deb package I created back in 2016. I have located a 17.11.7 Ubuntu PPA (https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have myself recently put up one for

[slurm-users] Can some partitions be excluded from a cluster when Slurm calculates user/total usage in Fairshare algorithm?

2020-03-11 Thread Wang, Manhui
Dear All, We have a HPC cluster with Slurm job scheduler (17.02.8). There are several private partitions (which are sponsored by several groups) and a "common" partition. Private partitions are exclusively used by those private users, and all users (including private users) have equal access

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread mike tie
Yep, slurmd -C is obviously getting the data from somewhere, either a local file or from the master node. hence my email to the group; I was hoping that someone would just say: "yeah, modify file ". But oh well. I'll start playing with strace and gdb later this week; looking through the

[slurm-users] Servers in pending state

2020-03-11 Thread Zohar Roe MLM
Hello, I have a queue with 6 servers. When 4 of the servers are with heavy load, If I send new jobs to the other 2 servers which are free and under different partition and features, The jobs are still in pending mode (can take them 20 minutes to start running) If I change their priority with

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Kirill 'kkm' Katsnelson
On Tue, Mar 10, 2020 at 1:41 PM mike tie wrote: > Here is the output of lstopo > > *$* lstopo -p > > Machine (63GB) > > Package P#0 + L3 (16MB) > > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0 > > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1 > > L2