[slurm-dev] RE: Struggling with power saving

2017-03-16 Thread Nathan Harper
than satisfactory scheduling decisions - we use TopologyPlugin=topology/tree to colocate jobs to as few switches as possible. Powered off nodes wouldn't be considered, so jobs would be scattered over multiple switches, rather than turning on a few nodes on the same switch. -- *Nathan Harper*

[slurm-dev] Re: Restrict users to see only jobs of their groups

2016-11-01 Thread Nathan Harper
cess to an account also a coordinator of that account. -- *Nathan Harper* // IT Systems Lead *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/p

[slurm-dev] Re: Restrict users to see only jobs of their groups

2016-11-01 Thread Nathan Harper
Hi, No solution, but a 'me too'. -- *Nathan Harper* // IT Systems Lead *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/natha

[slurm-dev] Re: Slurm web dashboards

2016-09-28 Thread Nathan Harper
There is a fork of Dashing that is still relatively current. We have a couple of Dashing dashboards which parses squeue/scontrol/sacct output to show SLURM information alongside other cluster info (nodes up/down, power usage etc) -- *Nathan Harper* // IT Systems Lead *e: * nathan.har...

[slurm-dev] starttime and resource limits

2016-09-07 Thread Nathan Harper
Hi, We've implemented QoS resource limits thanks to some past suggestions on this list. However it does seem to have broken some of our scheduling. Jobs that are held due to QOSGrpNodeLimit have a starttime=unknown, despite all other jobs within the same limit having end times associated with t

[slurm-dev] QoS TRES limits

2016-08-01 Thread Nathan Harper
Hi, We are trying to get to the bottom of some TRES limits we have in place, to work out if it should be expected behaviour. We have two QoS configured, 'low' and 'normal'. Normal is the default QoS and applys limits at the association level. The low QoS has it's own TRES limits applied to it

[slurm-dev] view resource limit status

2015-11-19 Thread Nathan Harper
Is there an equivalent to GridEngine's 'qquota' to view resource limit status? For example, an account is limited to 20 nodes, how does a user know the resource use?

[slurm-dev] Re: Nodes are getting DOWN state

2015-08-25 Thread Nathan Harper
different rates. On 25 August 2015 at 13:33, Fahad Ibrahim Alzannan wrote: > Hi, > > Actually some working nodes are delayed by around 5 mins also the down > nodes ! > ------ > *From:* Nathan Harper [nathan.har...@cfms.org.uk] > *Sent:* Tuesday, August

[slurm-dev] Re: Nodes are getting DOWN state

2015-08-25 Thread Nathan Harper
Hi - can you check that your clocks are in sync between your compute nodes and controllers? -- *Nathan Harper* On 25 August 2015 at 11:51, Fahad Ibrahim Alzannan wrote: > Hi, > > > We have a cluster and some nodes are down we tried to set them idle using > "scontrol upda

[slurm-dev] Re: Slurm and docker/containers

2015-06-05 Thread Nathan Harper
Has anyone looked at using LXC rather than Docker specifically? From what I understand, it's possible to run unprivileged LXC containers, so no need to be root. -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Nathan Harper
I've looked at this in the past - however, when I use your example (replacing with real data) I just get a 'Nothing Modified' response -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cf

[slurm-dev] Re: user exclusivity on node

2014-11-24 Thread Nathan Harper
k and affecting other jobs. If a user is paying for core hours, they won't be happy if a single core job on one of 'their' nodes slowed their parallel job down. I'd echo Marcin's comment: *a user could only hurt themselves when running new buggy code.* -- *Nathan Harpe

[slurm-dev] Re: user exclusivity on node

2014-11-21 Thread Nathan Harper
;t terrible wasteful, but it would be nice to use con_res, then let users choose if they want to share jobs, but only with themselves. -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://w

[slurm-dev] Re: slurm-dev RE : Struggling with configuration of acct_gather_energy/ipmi

2014-11-10 Thread Nathan Harper
I've got a real mixture of nodes, some older than Sandybridge, and I find that IPMI does a better job of getting 'whole system' power use. I was hoping to pick up power from my GPU nodes too, but those nodes don't report via IPMI either -- *Nathan Harper* // IT S

[slurm-dev] Re: job_submit_lua

2014-10-06 Thread Nathan Harper
ght be missing something obvious, but I can't find any documentation about this -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon

[slurm-dev] Re: job_submit_lua

2014-10-06 Thread Nathan Harper
ser("your qos has been changed") log_info("slurm_job_submit: job from uid %d, setting qos value: %s", submit_uid, job_desc.qos = qos end end -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1

[slurm-dev] job_submit_lua

2014-10-06 Thread Nathan Harper
After a suggestion in another thread, I have been toying with a lua job submit plugin, and I've achieved my original goal. I've been sucked in by the possibilities of the plugin, and the prospect of presenting users with a comment on job submission is an attractive one. I've not been able to get

[slurm-dev] Re: Building Slurm 14.03.x with FreeIPMI 1.4.5?

2014-10-02 Thread Nathan Harper
We're using 14.03.7 with FreeIPMI 1.4.5 and didn't have to do anything unusual to get it built. FreeIPMI was built from source into an RPM, then SLURM itself built into an RPM. -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *

[slurm-dev] Re: per-partition account resource limits

2014-10-01 Thread Nathan Harper
s/documentation out there? -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/nathan-harper/21/69

[slurm-dev] per-partition account resource limits

2014-09-30 Thread Nathan Harper
1 to slots=160 limit users GroupB queues partition1 to slots=96 limit users GroupC queues partition1 to slots=64 -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> //

[slurm-dev] Re: Cluster(s) seem OK, but: Zero Bytes were transmitted or received (14.03.6)

2014-08-18 Thread Nathan Harper
Hi, Just a 'me-too' - also running 14.03.6 on compute nodes, with master nodes running RHEL5 with -O0 and getting the same thing in the logs, so it's not just you. -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 078

[slurm-dev] Re: Disabling automatic optimization

2014-07-29 Thread Nathan Harper
you can set the CFLAGS variable before ./configure -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/

[slurm-dev] Re: 14.03 on RHEL5.10

2014-07-16 Thread Nathan Harper
Slight update - after changing the things in slurm.conf that it is complaining about, 'service slurm start' now doesn't throw any errors. However, attempting to run anything give a segfault: # sacctmgr Segmentation fault -- *Nathan Harper* // IT Systems Architect *

[slurm-dev] 14.03 on RHEL5.10

2014-07-16 Thread Nathan Harper
rocess configuration file -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/nathan-harper/21/696/b81&