Re: [slurm-users] [Slurm 18.08.4] sacct/seff Inaccurate usercpu values

2019-01-17 Thread Henkel, Andreas
Thank you Mike. Didn’t see that yet. > Am 16.01.2019 um 16:57 schrieb Michael Robbert : > > Andreas, > > Look again. I just looked and a commit to the source code was posted to > the bug yesterday afternoon. It looks like that patch applies to the > cgroup plugin. It won't show up until the ne

Re: [slurm-users] Slurms nodes over VPN?

2019-01-17 Thread Chris Samuel
On 17/1/19 1:24 pm, rapier wrote: Or is there some known and understood way to do something like this that might be documented somewhere? What you're describing sounds a lot like Slurm's cloudbursting facility where nodes can be defined with a name that Slurm knows but when they boot they in

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Fulcomer, Samuel
Yes, well, the trivial cat-skinning method is to use topology.conf to describe multiple switch topologies confining each architecture to their meta-fabric. We use GPFS as a parallel filesystem, and all nodes are connected, but topology.conf keeps jobs on uniform-architecture collectives. On Thu, J

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Nicholas McCollum
I recommend putting heterogeneous node types each into their own patition to keep jobs from spanning multiple node types. You can also set QoS's for different partitions and make a job in that QoS only able to be scheduled on nodes=1. You could also accomplished this with a partition config in

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Fulcomer, Samuel
We use topology.conf to segregate architectures (Sandy->Skylake), and also to isolate individual nodes with 1Gb/s Ethernet rather than IB (older GPU nodes with deprecated IB cards). In the latter case, topology.conf had a switch entry for each node. It used to be the case that SLURM was unhappy wi

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Ryan Novosielski
I don’t actually know the answer to this one, but we have it provisioned to all nodes. Note that if you care about node weights (eg. NodeName=whatever001 Weight=2, etc. in slurm.conf), using the topology function will disable it. I believe I was promised a warning about that in the future in a

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Ryan Novosielski
> On Jan 17, 2019, at 4:49 PM, Prentice Bisbal wrote: > > From https://slurm.schedmd.com/topology.html: > >> Note that compute nodes on switches that lack a common parent switch can be >> used, but no job will span leaf switches without a common parent (unless the >> TopologyParam=TopoOptional

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Prentice Bisbal
And a follow-up question: Does topology.conf need to be on all the nodes, or just the slurm controller? It's not clear from that web page. I would assume only the controller needs it. Prentice On 1/17/19 4:49 PM, Prentice Bisbal wrote: From https://slurm.schedmd.com/topology.html: Note that

[slurm-users] Topology configuration questions:

2019-01-17 Thread Prentice Bisbal
From https://slurm.schedmd.com/topology.html: Note that compute nodes on switches that lack a common parent switch can be used, but no job will span leaf switches without a common parent (unless the TopologyParam=TopoOptional option is used). For example, it is legal to remove the line "Switch

[slurm-users] Slurms nodes over VPN?

2019-01-17 Thread rapier
Hi, I'm still in the process of understanding slurm but I was hoping someone might have an answer for this. We've a pretty specific use case where we want to have slurm nodes connected to a slurm controller via VPN. The IPs are assigned to nodes via OpenVPN. So is possible to define nodes wi

Re: [slurm-users] 'slurmd -c' not returning correct information

2019-01-17 Thread Prentice Bisbal
Nevermind. This was a layer 8 problem. I was editing the wrong slurm.conf. We recently switched to using RPMs, and I was accidentally edited the file in the location used before we switched to using RPMs. It turns out those errors were always there in slurmctld.log, and no one ever noticed. Now

[slurm-users] 'slurmd -c' not returning correct information

2019-01-17 Thread Prentice Bisbal
It appears that 'slurmd -C is not returning the correct information for some of the systems in my very heterogeneous cluster. For example, take the node dawson081: [root@dawson081 ~]# slurmd -C NodeName=dawson081 slurmd: Considering each NUMA node as a socket CPUs=32 Boards=1 SocketsPerBoard=4

Re: [slurm-users] SlurmDBD setup with mysql

2019-01-17 Thread Sajesh Singh
Our clustername does not have a "-" but the hostname does. Does the slurmdbd accounting try to create a table name based on the hostname or clustername only? -SS- -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Matthew BETTINGER Sent: Thu

Re: [slurm-users] SlurmDBD setup with mysql

2019-01-17 Thread Matthew BETTINGER
Not Sure if this is related but we ran into an issue configuring accounting because our clustername had a '-' in the name . This is an illegal character for table names in mariadb, or used to be. On 1/17/19, 11:07 AM, "slurm-users on behalf of Sajesh Singh" wrote: Trying to setup acco

[slurm-users] SlurmDBD setup with mysql

2019-01-17 Thread Sajesh Singh
Trying to setup accounting using the MySQL backend and I am getting errors from the slurmctld and slurm tools when trying to interact with the accounting database. Tried starting in debug as well, but could not see anything else that could point to what could be causing this issue. I have follow