Re: [slurm-users] accrue_cnt underflow

2018-11-12 Thread Chris Samuel
On Tuesday, 6 November 2018 1:02:02 AM AEDT kamil wrote: > Any idea what these mean and how to handle it? No, but we've just upgraded and see the same. I've opened a bug: https://bugs.schedmd.com/show_bug.cgi?id=6016 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] constraints question

2018-11-12 Thread Chris Samuel
ate resources: Invalid feature specification -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Reserving a GPU

2018-11-11 Thread Chris Samuel
elease.. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] constraints question

2018-11-11 Thread Chris Samuel
s. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-09 Thread Chris Samuel
some chance? If so you might want to check if it works with it disabled first.. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-09 Thread Chris Samuel
s" say for you? Remember just because you can run squeue on the DB server and talk to the control daemon doesn't mean that the slurmctld has told the slurmdbd to use that same working IP address that squeue is getting via slurm.conf. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] virtual memory limit exceeded

2018-11-09 Thread Chris Samuel
393), being killed > > Is this a limit that's dictated by cgroup.conf It's not cgroups, that is enforced by the kernel instead, whereas this is Slurm monitoring jobs and deciding it's used too much memory and it needs to kill it. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-07 Thread Chris Samuel
On Wednesday, 7 November 2018 3:46:01 PM AEDT Brian Andrus wrote: > Ah. I was getting ahead of myself. I used 'limits' and I have no limits > configured, only associations. Changed it to just associations and all is > good. Excellent! Well spotted.. -- Chris Samuel : http://www.cs

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Chris Samuel
be disruptive though, should it? We just flip a symlink and the users see the new binaries, libraries, etc immediately, we can then restart daemons as and when we need to (in the right order of course, slurmdbd, slurmctld and then slurmd's). All the best, Chris -- Chris Samuel : http

Re: [slurm-users] slurmstepd crash 18.03 when using pmi2 interface

2018-11-02 Thread Chris Samuel
> needed by pmi2. This is what we have working (with 17.11.x): /dev/null /dev/urandom /dev/zero /dev/sda* /dev/cpu/*/* /dev/pts/* /dev/ram /dev/random /dev/hfi* -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Chris Samuel
r example: We don't see those here, but defunct (zombie) processes don't really exist; they're just caching the exit status until their parent gets around to wait()ing for them and then they can be reaped. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Chris Samuel
ced it by then. However, we've seen it now too. We'll try the disable/mask trick for `systemd-logind` too. cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Socket timed out on send/recv operation

2018-10-20 Thread Chris Samuel
NFS latencies are the problem here. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Can frequent hold-release adversely affect slurm?

2018-10-20 Thread Chris Samuel
duling decisions on them. I'm out of ideas sorry! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] requesting entire vs. partial nodes

2018-10-20 Thread Chris Samuel
m at the beginning but holes then open up as jobs finish. So hopefully you'll have a nice mix of job sizes that will fit those holes. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Resource sharing between different clusters

2018-10-19 Thread Chris Samuel
g it? My understanding (having never tried federation) is that each cluster will run its own slurmctld's and slurmds, but they must share the same slurmdbd. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread Chris Samuel
ata. cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Job walltime

2018-10-18 Thread Chris Samuel
Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] slurmdbd not showing job accounting

2018-10-17 Thread Chris Samuel
ng correctly. That's... odd. I've never seen that. Worth trying by hand on a clean install running slurmdbd like this: slurmdbd -Dvvv to see if there's anything obvious showing up in the debug logs to indicate some problems. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] x11 forwarding not available?

2018-10-16 Thread Chris Samuel
ere. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Chris Samuel
that systemd is killing slurmdbd for some reason. What happens if you run slurmdbd by hand as root? Like this: slurmdbd -D - That should run it in the foreground and output debug info to the screen. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel
-- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel
by default in 17.11.x (and I'm not even sure it works if you enable it there) and seems to be enabled by default in 18.08.x. To see check the _enable_pack_steps() function src/srun/srun.c All the best, Chris (currently away in the UK) -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-10-01 Thread Chris Samuel
VLSCI we would get reporting questions about usage (even after systems had been decommissioned) that we needed to go back to get data out of Slurm for. Luckily we had some beefy Percona MySQL servers in a cluster! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-25 Thread Chris Samuel
e, then I can do slurmctld (with partitions marked down, just in case). Once those are done I can restart slurmd's around the cluster. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Renaming a Reservation

2018-09-25 Thread Chris Samuel
On Tuesday, 25 September 2018 1:54:19 PM AEST Kevin Buckley wrote: > Is there a way to rename a Reservation ? I've never come across a way to do that, I've just had to delete and recreate. Sorry Kevin! -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] swap size

2018-09-22 Thread Chris Samuel
this mode it's just sending processes SIGSTOP and then launching the incoming job so you should really have enough swap for the previous job to get swapped out to in order to free up RAM for the incoming job. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Job allocating more CPUs than requested

2018-09-22 Thread Chris Samuel
restriction. I guess it's possible the next level caches might get a work out, but then unless you're restricting OS daemon processes to cores that are not used by Slurm then you're probably still going to get some amount of cache pollution anyway. All the best! Chris -- Chris Sam

Re: [slurm-users] Job allocating more CPUs than requested

2018-09-21 Thread Chris Samuel
e mode, or do you mean that the code inside the job uses all the cores on the node instead of what was requested? The latter is often the case for badly behaved codes and that's why using cgroups to contain applications is so important. All the best, Chris -- Chris Samuel : http://www.csamu

Re: [slurm-users] Dealing with wrong things that users do

2018-09-20 Thread Chris Samuel
ps://slurm.schedmd.com/cgroups.html https://slurm.schedmd.com/pam_slurm_adopt.html Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Setting up a separate timeout for interactive jobs

2018-09-20 Thread Chris Samuel
ooking for the absence of a batch script. Have a look at this bug: https://bugs.schedmd.com/show_bug.cgi?id=3094 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] External provisioning for accounts and other things (?)

2018-09-19 Thread Chris Samuel
lob/master/karaage/datastores/slurm.py Hope that helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] email preferences

2018-09-15 Thread Chris Samuel
On Thursday, 13 September 2018 4:24:41 AM AEST Ariel Balter wrote: > How do I set email preferences for this group? https://lists.schedmd.com/cgi-bin/mailman/options/slurm-users -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Create users

2018-09-15 Thread Chris Samuel
based & independently created), so when people are added/modified/deleted then it runs sacctmgr to keep everything in step. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm on POWER9

2018-09-15 Thread Chris Samuel
post-build. Correct - autoconf will detect hwloc if the headers & library are present there at compile time. It links against it so it *must* be there when you are compiling in order to use it. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
r version of Slurm - works happily on 17.11.x. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm on POWER9

2018-09-10 Thread Chris Samuel
es=. The gres.conf we use on our HPC cluster uses Cores= quite happily. Name=gpu Type=p100 File=/dev/nvidia0 Cores=0-17 Name=gpu Type=p100 File=/dev/nvidia1 Cores=18-35 All the best! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
s package in RHEL/CentOS and cgroup-tools in Debian. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Configuration issue on Ubuntu

2018-09-05 Thread Chris Samuel
eat catch Gennaro! Ah, just noticed you're the Debian package maintainer for Slurm. :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-31 Thread Chris Samuel
ing until it hits its time limit (unless, as you say, you manually kill that step yourself). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-31 Thread Chris Samuel
ve you from users setting CUDA_VISIBLE_DEVICES themselves and accessing GPUs they are not meant to, you really really do need to use cgroups to stop that happening. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chris Samuel
ou set CUDA_VISBLE_DEVICES to be as processes will only be able to access what they requested. Hope that helps! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Configuration issue on Ubuntu

2018-08-29 Thread Chris Samuel
load plugin > /root/sl/sl2/lib/slurm/crypto_munge.so To me that looks like you managed to compile Slurm against a version of Munge installed under root's home directory. This is unlikely to be what you want. If you build Slurm as a non-root user then it won't find that. All the best, Chris --

Re: [slurm-users] Configuration issue on Ubuntu

2018-08-28 Thread Chris Samuel
bly an bug in how they packaged it! Best of luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-27 Thread Chris Samuel
here (which I've just started using for a side radio-astronomy project at the observatory I volunteer at): https://www.brendanlong.com/systemd-user-services-are-amazing.html Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-23 Thread Chris Samuel
On Tuesday, 21 August 2018 6:17:59 PM AEST Chris Samuel wrote: > My apologies - I've just tested here (with Slurm 17.11.7) and you are indeed > correct, they only appear when launched with sbatch and salloc and not when > you launch jobs directly with srun! I think the confusion i

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-08-23 Thread Chris Samuel
acks the configuration of the "memory" cgroup. (see output below) I don't see that on our CentOS 7.5 system, which distro are you using? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-21 Thread Chris Samuel
ck for both. Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Determine usage for a QOS?

2018-08-20 Thread Chris Samuel
on of > the various counters and limits the controller keeps in memory. Awesome, thanks Kilian! $ scontrol show assoc_mgr QOS=astac_oz045 | fgrep UsageRaw= UsageRaw=18641632.00 Looking promising... -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-20 Thread Chris Samuel
nuking your shells environment perhaps? 17.02.11 is the last released version of 17.02.x and all previous versions have been pulled from the SchedMD website due to CVE-2018-10995. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

[slurm-users] Taking a break from slurm-users

2018-05-12 Thread Chris Samuel
Hey folks, I'm going to be unsubscribing from slurm-users for a while as I'll be travelling to the US & UK for a number of weeks & I don't want to drown in email. I'll be back... -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] User limits for multiple associated accounts

2018-05-12 Thread Chris Samuel
ts, not what is currently in use. So the sum of the requested memory of all jobs running in that association doesn't leave enough permitted resources free to allow this job to begin. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] How to check if there's a reservation

2018-05-12 Thread Chris Samuel
4-35,37-45,52-53,65-66,72-86] It does result in a job being allocated which will never appear in your accounting though, so you'll need to be prepared for that. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Issue with salloc

2018-05-12 Thread Chris Samuel
n/bash exec srun $* --pty -u ${SHELL} -l That's it.. Hope that helps! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] slurm reboot node with spank plugin

2018-05-10 Thread Chris Samuel
iles. Hope that helps! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Memory oversubscription and sheduling

2018-05-10 Thread Chris Samuel
ling them fixed that. Best of luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-10 Thread Chris Samuel
On Thursday, 10 May 2018 1:02:36 AM AEST Eric F. Alemany wrote: > All seem good for now Great news! -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Splitting mpi rank output

2018-05-10 Thread Chris Samuel
:: john46 Hope that helps, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] slurmdbd: mysql/accounting errors on 17.11.6 upgrade

2018-05-07 Thread Chris Samuel
upport contract I would be opening a bug with SchedMD now. cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-07 Thread Chris Samuel
thub.com/dun/munge/wiki/Installation-Guide Good luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-07 Thread Chris Samuel
art munged as well? That's what's reading the key, not Slurm. Munge is just an external service that Slurm talks to. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] sacct: error

2018-05-07 Thread Chris Samuel
ads, but from the point of view of what you *request* CPUs are just boards*sockets*cores. Confusing! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-06 Thread Chris Samuel
t doesn't care about that. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-06 Thread Chris Samuel
node=rocks7 state=resume cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-06 Thread Chris Samuel
Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] sacct: error

2018-05-06 Thread Chris Samuel
On Sunday, 6 May 2018 2:58:26 PM AEST Chris Samuel wrote: > Very very interesting - both slurmd and lscpu report 32 cores, but with > differing interpretations of the number of the layout. Meanwhile the AMD > website says these are 16 core CPUs, which means both Slurm and lscpu ar

Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-05 Thread Chris Samuel
ler which is why it's important to know about for memory locality). What's the hardware you're running this on? Also can you refresh my memory please, what do each of these say? lscpu slurmd -C lstopo (don't worry if that last one isn't there) All the best, Chris -- Chris Samuel : h

Re: [slurm-users] After Each slurm Run, I Need to Reinstall slurm

2018-05-05 Thread Chris Samuel
his helps.. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] GPU / cgroup challenges

2018-05-05 Thread Chris Samuel
e 17.11.0 (as I know it works for us with 17.11.5) or a kernel bug (or missing device cgroups). Sorry I can't be more helpful! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Hung tasks and high load when cancelling jobs

2018-05-05 Thread Chris Samuel
fixed in 17.11.6 which will be out soon. I can't tell if you're hitting the same bug we hit, but I'd suggest re-testing when it appears. Good luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Repost: Odd sacct behavior?

2018-05-05 Thread Chris Samuel
wnload that version any more from SchedMD because of the CVE. I'd suggest upgrading if you can. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] sacct: error

2018-05-05 Thread Chris Samuel
n you can schedule 16 tasks per node and each task can use 2 threads. What does "slurmd -C" say on that node? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] SLURM on Ubuntu 18.04

2018-05-04 Thread Chris Samuel
l says: --sysconfdir=/etc/slurm-llnl -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] wckey specification error

2018-05-02 Thread Chris Samuel
ATTERN} will do git's grep with no need to have a git repository. Plus it paginates, etc, for you. Also pretty fast. :-) -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] wckey specification error

2018-05-01 Thread Chris Samuel
https://slurm.schedmd.com/wckey.html Good luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-29 Thread Chris Samuel
tld and slurmd everywhere and see where that gets you. If it's still drained but the hardware config looks good in Slurm then you can do "scontrol update node=rocks7 state=resume" to tell Slurm to try using it again. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-29 Thread Chris Samuel
Hi Mahmood, Not quite what I meant sorry. What does this say? scontrol show config | fgrep -i rocks7 cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-29 Thread Chris Samuel
On Sunday, 29 April 2018 4:11:39 PM AEST Mahmood Naderan wrote: > So, I don't know why only 1 core included What do you have in your slurm.conf for rocks7? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-28 Thread Chris Samuel
hardware resources to meet what you've told it. What does "slurmd -C" say on rocks7 ? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-28 Thread Chris Samuel
On Saturday, 28 April 2018 7:58:08 PM AEST Mahmood Naderan wrote: > I see that the state of the frontend is Drained. Is that the default > state? Probably not. What does "sinfo --list-reasons" say? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] sacct not shows user

2018-04-27 Thread Chris Samuel
txt) but I think that is just a mechanism to store information about completed jobs. Slurmdbd also stores information about users, accounts and associations and so I suspect you'll need that to be able to get that information. Also note account = bank account, user = username. Hope that he

Re: [slurm-users] sacct not shows user

2018-04-26 Thread Chris Samuel
StorageEnforce to anything that requires associations then jobs will quite happily start without being part of one (from memory). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Runaway jobs issue: : Resource temporarily unavailable, slurm 17.11.3

2018-04-25 Thread Chris Samuel
On Wednesday, 25 April 2018 3:47:17 PM AEST Chris Samuel wrote: > I'll open a bug just in case.. https://bugs.schedmd.com/show_bug.cgi?id=5097 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Runaway jobs issue: : Resource temporarily unavailable, slurm 17.11.3

2018-04-24 Thread Chris Samuel
her, but I wonder if there has been accidental redefinitions (for instance the one in slurm_persist_conn.c didn't appear until 2016, whilst the one in slurm_protocol_socket_implementation.c was set to that value (1GB) back in 2013. I'll open a bug just in case.. cheers, Chris -- Chris Samuel : h

Re: [slurm-users] Partition 'alias'?

2018-04-24 Thread Chris Samuel
A job can be submitted to many partitions (modulo local policy) but once it starts it is only in one partition, that might be what you are thinking of here. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Job still running after process completed

2018-04-23 Thread Chris Samuel
s on that same node. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-23 Thread Chris Samuel
isunderstood what you were trying to achieve. I assumed you wanted a homogenous configuration for the partition. Yes, if you are happy for the asymmetry then you can do that. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm overhead

2018-04-23 Thread Chris Samuel
's really something very wrong in your setup I'm afraid. I've not seen an impact like that just from running Slurm. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Include some cores of the head node to a partition

2018-04-21 Thread Chris Samuel
ll cores. All you need to do is add "MaxCPUsPerNode=20" to that to limit the number of cores that the partition can use. We do this for our non-GPU job partition to reserve some cores for the GPU job partition. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] SLURM's reservations

2018-04-17 Thread Chris Samuel
ore) format options to you so I had to guess a little. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Chris Samuel
On Tuesday, 17 April 2018 5:08:09 PM AEST Mahmood Naderan wrote: > So, UsePAM has not been set. So, slurm shouldn't limit anything. Is > that correct? however, I see that slurm limits the virtual memory size What does this say? scontrol show config | fgrep VSizeFactor -- Chris

Re: [slurm-users] What version I should install?

2018-04-17 Thread Chris Samuel
ob_comp_mysql.so) and slurm-slurmdbd (accounting_storage_mysql) packages. The example configuration files have been moved to slurm-example-configs. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] What version I should install?

2018-04-16 Thread Chris Samuel
we can just restart services as we wish to pick up the right one (/apps is a shared read-only filesystem across all the cluster nodes). Our config directory is /apps/slurm/etc so again only one place to modify things for all nodes to see the change. All the best, Chris -- Chris Sam

Re: [slurm-users] SLURM's reservations

2018-04-16 Thread Chris Samuel
rvationId,Start,TotalTime best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] slurm-17.11.5 usage of X11

2018-04-14 Thread Chris Samuel
r us (yet - it seems to work pretty well so far on our systems) but worth keeping in mind for the future. Thanks for sharing! -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Two lines are printed by sacct

2018-04-13 Thread Chris Samuel
eason it doesn't at the moment is because you're telling it not to tell you. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Chris Samuel
intensive to see differences. It also sounds like you've got a problem with either your Slurm job or Slurm itself from the error you posted. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Chris Samuel
and communications instead. Something like NAMD or a synthetic benchmark like HPL. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-24 Thread Chris Samuel
. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] GrpTRES

2018-03-24 Thread Chris Samuel
under the covers for you when you do "useradd". [...] > I think something is wrong. Any idea? What does this say? scontrol show config | fgrep AccountingStorageEnforce All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

<    1   2   3   4   >