Re: [slurm-users] How can jobs request a minimum available (free) TmpFS disk space?

2019-09-04 Thread Chris Samuel
tory to the amount requested (so it couldn't be exceeded) and then used the private tmp spank plugin to map that into what the job saw as /tmp, /var/tmp and /dev/shm. The epilog then cleaned up after the job. Worked nicely! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Usage splitting

2019-09-01 Thread Chris Samuel
nd set "EnforceUsageThreshold" on the QOS that you don't want to be able to exceed its limit. https://slurm.schedmd.com/sacctmgr.html#lbAW All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] 19.05 and GPUs vs GRES

2019-08-13 Thread Chris Samuel
, we didn't change our config for GPUs and (so far) things seem OK. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm version 19.05.2 is now available

2019-08-13 Thread Chris Samuel
Hi Kevin! On 13/8/19 7:25 pm, Kevin Buckley wrote: Then again, perhaps the bug seen there has been fixed in some other way for 19.05.2? From what I can see with "git log -Saries -p" it appears not yet. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] 19.05 and GPUs vs GRES

2019-08-12 Thread Chris Samuel
file. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm node weights

2019-07-27 Thread Chris Samuel
a job specifically requests so via the --switches option to sbatch to request how many switches a job should span. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm node weights

2019-07-27 Thread Chris Samuel
u can. There's an awful lot of fixes you are missing out on otherwise. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Chris Samuel
trol reconfigure? BTW that check was introduced in 2003 by Moe :-) https://github.com/SchedMD/slurm/commit/1c7ee080a48aa6338d3fc5480523017d4287dc08 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] slurm-19.05 link error

2019-07-23 Thread Chris Samuel
On Tuesday, 23 July 2019 7:47:33 PM PDT Weiguang Chen wrote: > I just reinstalled hdf5, but the error still exist. Are you going to use HDF5 actively? If not tell configure not to use it by adding the --with-hdf5=no flag to your configure line. -- Chris Samuel : http://www.csamuel.

Re: [slurm-users] What means this error ?

2019-07-21 Thread Chris Samuel
On Monday, 24 June 2019 10:47:46 PM PDT Valerio Bellizzomi wrote: > slurmctld: error: High latency for 1000 calls to gettimeofday(): 2072 > microseconds Are you running in a VM ? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-16 Thread Chris Samuel
:27.515] error: slurmdbd: agent queue filling (20140), RESTART SLURMDBD NOW Have you tried doing what it told you to? You may want to look at the performance of you MySQL server to see if it's failing to keep up with what slurmdbd is asking it to do. All the best, Chris -- Chris Samuel

Re: [slurm-users] Slurm 19's "Changed the default fair share algorithm to "fair tree".": implications for slurm.conf PriorityFlags setting

2019-07-15 Thread Chris Samuel
the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Chris Samuel
there to see what's using it. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Chris Samuel
On 11/7/19 11:04 pm, Pär Lundö wrote: It works fine running on a single node(with ”-N1” instead of ”-N2”), but it is aborted or stopped when running on two nodes. What is the error you get? Does the same srun command but with "hostname" instead of Python work? -- Chris Samue

Re: [slurm-users] Suddenly getting "Invalid node name specified" when attempting srun/sbatch

2019-07-10 Thread Chris Samuel
. So I'd look to check that side of things are OK. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Chris Samuel
On 3/7/19 8:49 am, David Baker wrote: Does the above make sense or is it too complicated? [looks at our 14 partitions and 112 QOS's] Nope, that seems pretty simple. We do much the same here. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] SLURM_NTASKS values in interactive and batch jobs

2019-07-03 Thread Chris Samuel
/bash to stay on the login node. Same on our test 19.05.0 system. Which version of Slurm are you on? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] dual slurmctld and slurmdbd

2019-07-03 Thread Chris Samuel
GPFS for this) as both slurmctld's need to see the same state directory all the time. We also run slurmdbd in failover mode talking to the same MySQL/MariaDB instance (but with a backup in case that fails). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] spawning a new terminal for each srun

2019-06-30 Thread Chris Samuel
hether that be an ssh session, an xterm or using screen or tmux to multiplex terminals on a single session. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] status of cloud nodes

2019-06-19 Thread Chris Samuel
What does "scontrol show node $NODE" say where $NODE is the name of a node that isn't being listed despite you expecting it to be? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Chris Samuel
ind a way to trace mpirun - I think it's just a shell script so running it with "bash -x mpirun {etc}" would probably do it. That said you're probably better off just using srun anyway. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Submit job using srun fails but sbatch works

2019-06-06 Thread Chris Samuel
rmctld to see if that helps. Good luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to preempt job with priority_multifactor parameter ?

2019-06-05 Thread Chris Samuel
to do that, the existing ones only use QOS or partition. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] strigger on CG, completing state

2019-05-31 Thread Chris Samuel
nning job > or cancels it with cntrl. When this happens we can have many many nodes > stuck in CG. Slurm 17.02.6. Thanks! Are you using cgroups to control/constrain jobs? 17.02 is very old, now 19.05 is out only it and 18.08 are getting updates. All the best, Chris -- Chris

Re: [slurm-users] Slurm Install on Remote System

2019-05-26 Thread Chris Samuel
to trial things like cgroups you'll want a VM at least. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Parse error on starting slurmctld/slurmd

2019-05-26 Thread Chris Samuel
tween DOS & Linux line ending conventions? See this for more on that last point: https://kb.iu.edu/d/acux All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm Install on Remote System

2019-05-25 Thread Chris Samuel
up a little cluster in a set of VM's makes life a lot easier for you as you'll be able to control the whole environment. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm Install on Remote System

2019-05-25 Thread Chris Samuel
groups at all then it may just work to run the daemons by hand yourself. You would need to make sure you specify your username as both the "SlurmUser" and "SlurmdUser" in slurm.conf as well. Best of luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] options for ResumeProgram

2019-05-20 Thread Chris Samuel
that Slurm is calling to do this work to see if anything is hiding there. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Issue with x11

2019-05-15 Thread Chris Samuel
forwarding is reworked in 19.05 so it may be worth testing that out to see whether that improves things in this area. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Intel MPI startup

2019-04-29 Thread Chris Samuel
; with Intel on what appears to be an undocumented regression and all I > got after several back and forths was that PMI-2 is not supported in > Intel MPI 2019. Is that because they've moved to PMIx exclusively now? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Job dispatching policy

2019-04-29 Thread Chris Samuel
rocks run host compute-0-1 /bin/bash -x /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2 Also why aren't you using the Slurm commands to run things? Does this "rocks" command use them under the covers? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Setting NodeAddr dynamically and not in slurm.conf

2019-04-28 Thread Chris Samuel
On 27/4/19 10:07 pm, J.R. W wrote: Using slurm 15.08.7 Is that a typo for 18.08.7 ? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Job dispatching policy

2019-04-27 Thread Chris Samuel
it has in the first #! line does not exist. What does this command say on that node? file /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] scontrol for a heterogenous job appears incorrect

2019-04-23 Thread Chris Samuel
in the first one of the pack jobs instead? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-13 Thread Chris Samuel
g generally coincides with the processing unit logical number (PU L#) seen in lstopo output. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Getting current memory size of a job

2019-04-07 Thread Chris Samuel
I application, and also that the MPI stack you are using does not know about Slurm and so doesn't know to start itself correctly when you run with mpirun. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Chris Samuel
their tables properly updated (and as you say, either apply your patch or migrate their MySQL server to a box running a more recent version of MySQL - it doesn't have to be on the same system running slurmdbd). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Chris Samuel
ing, it's not applicable to 19.05 (which is coming up quickly now and so they'll be busy trying to get ready for that). "Release dates in calendar are closer than they appear" All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Getting current memory size of a job

2019-04-01 Thread Chris Samuel
md.com/show_bug.cgi?id=4966 but it looks like it's languished since I left Australia. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Chris Samuel
f MariaDB or MySQL is strongly encouraged to prevent this problem. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-28 Thread Chris Samuel
to ask for that. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Chris Samuel
On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote: > Still only one node is running the processes What does "srun --version" say? Do you get any errors in your output file from the second pack job? All the best, Chris -- Chris Samuel : http://www.csamuel.org/

Re: [slurm-users] SLURM User Group Meetings: "Back Issues"

2019-03-27 Thread Chris Samuel
On 27/3/19 7:56 pm, Kevin Buckley wrote: Does the SchedMD website contain "back issues" of SLURM User Group Meeting info Yup, somewhat non-intuitively as publications: https://slurm.schedmd.com/publications.html Goes all the way back to something at SC08! -- Chris Samue

Re: [slurm-users] number of nodes varies for no reason?

2019-03-27 Thread Chris Samuel
with different numbers of nodes allocated. Does anyone have any idea why? You would need to share the output of "scontrol show nodes" to get an idea of what resources Slurm thinks each node has. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-27 Thread Chris Samuel
On 27/3/19 1:00 pm, Anne M. Hammond wrote: NodeName=fl[01-04] CPUs=24 RealMemory=4 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN This will give you 12 tasks per node, each task with 2 thread units. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Chris Samuel
considered in one partition to when it's being considered in a different partition. I don't think you can do that though I'm afraid, José, I think the weight is only attached to the node and the partition doesn't influence it. All the best, Chris -- Chris Samuel : http://www.csamuel.org

Re: [slurm-users] Can one specify attributes on a GRES resource?

2019-03-21 Thread Chris Samuel
On 21/3/19 7:39 pm, Will Dennis wrote: Why does it think that the "gres/gpu_mem_per_card" count is 0? How can I fix this? Did you remember to distribute gres.conf as well to the nodes? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-20 Thread Chris Samuel
etero_steps" option in your scheduler parameters, but even then I don't believe it's working properly there. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] problems with slurm and openmpi

2019-03-16 Thread Chris Samuel
https://slurm.schedmd.com/mpi_guide.html#pmix All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Chris Samuel
ugh if it wasn't then I'd expect a different error, other than resources. I think to understand better it'd be necessary to see what "scontrol show job" for a job stuck in that state looks like. If it helps we're running Slurm on Cray and make heavy use of QOS's. All the best, Chris -

Re: [slurm-users] Large job starvation on cloud cluster

2019-02-28 Thread Chris Samuel
of cores that an association has in the hierarchy either at or above that level that this would exceed. You'll probably need to go poking around with sacctmgr to see what that limit might be. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] SLURM docs: HTML title should be same as page title

2019-02-27 Thread Chris Samuel
On Monday, 25 February 2019 2:55:44 AM PST Patrice Peterson wrote: > Filed a bug: https://bugs.schedmd.com/show_bug.cgi?id=6573 Looks like Danny fixed it in git. https://github.com/SchedMD/slurm/commit/b1c78d9934ef461df637c57c001eb165a6b1fcc3 -- Chris Samuel : http://www.csamuel.

Re: [slurm-users] sacct end time for failed jobs

2019-02-27 Thread Chris Samuel
FAILED 2019-02-27T22:35:23 2019-02-27T22:36:38 00:01:15 COMPLETED The "COMPLETED" part is the extern step we have as we use pam_slurm_adopt. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Large job starvation on cloud cluster

2019-02-27 Thread Chris Samuel
servation is being created for the larger job, what do these say? sprio -l squeue --start scontrol show job ${LARGE_JOBID} All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] 转发: a heterogeneous job terminate unexpectedly

2019-02-27 Thread Chris Samuel
f you use Open-MPI instead of Intel MPI? I'm not sure whether Intel MPI can cope with heterogenous jobs or not (it doesn't seem to be documented anywhere what will, or will not, work with it). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] pam_slurm_adopt with pbis-open pam modules

2019-02-21 Thread Chris Samuel
am getting out of context somehow and have access to all > resources. Yes, check the documentation and review your PAM configuration. As I mentioned it sounds like you've got things in the wrong order there. https://slurm.schedmd.com/pam_slurm_adopt.html#PAM_CONFIG All the best, Chris -- Chr

Re: [slurm-users] Only one socket for SLURM

2019-02-21 Thread Chris Samuel
, to give a sane interface and default logical layout. Slurm uses a similar system that results in something that looks very similar, so to Slurm CPU 0 is socket 1, core 1, thread 1 and CPU 2 is socket 1, core 1, thread 2, etc... All the best, Chris -- Chris Samuel : http://www.csamuel.org/ :

Re: [slurm-users] Strange error, submission denied

2019-02-19 Thread Chris Samuel
e spec is: CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] pam_slurm_adopt with pbis-open pam modules

2019-02-18 Thread Chris Samuel
d can interfere with things. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] allocate last MPI-rank to an exclusive node?

2019-02-18 Thread Chris Samuel
might need a recent version of Open-MPI for instance. https://slurm.schedmd.com/heterogeneous_jobs.html All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Reserve CPUs/MEM for GPUs

2019-02-15 Thread Chris Samuel
U jobs were not. The submit filter did all the policing of that. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to get the CPU usage of history jobs at each compute node?

2019-02-15 Thread Chris Samuel
ptures that information in the granularity you want currently. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Chris Samuel
On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote: > #SBATCH --ntasks-per-node=48 I wouldn't mind betting is that if you set that to 24 it will work, and each thread will be assigned a single core with the 2 thread units on it. All the best, Chris -- Chris Samuel : h

Re: [slurm-users] Recording variables

2019-02-10 Thread Chris Samuel
ke that a requirement at submission time. It's also exposed inside the job as ${SLURM_JOB_ACCOUNT}. https://slurm.schedmd.com/sacctmgr.html https://slurm.schedmd.com/accounting.html All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Mysterious job terminations on Slurm 17.11.10

2019-02-01 Thread Chris Samuel
ook like? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] SLURM_JOB_GPU not set in salloc

2019-01-18 Thread Chris Samuel
the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurms nodes over VPN?

2019-01-17 Thread Chris Samuel
they inform slurmctld of their hostname and IP address. https://slurm.schedmd.com/elastic_computing.html Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Why is this command not working

2019-01-16 Thread Chris Samuel
: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) You need to configure munge on this node and tell slurmdbd to use it via the AuthInfo directive in your configuration file. https://slurm.schedmd.com/slurmdbd.conf.html Best of luck, Chris -- Chris Samuel : http://www.csamuel.org

Re: [slurm-users] Larger jobs tend to get starved out on our cluster

2019-01-16 Thread Chris Samuel
is not initialized So this looks like it's trying to use PMI1. What do the following say? srun --mpi=list scontrol show config | fgrep -i mpidefault All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Two jobs ends up on one GPU?

2019-01-16 Thread Chris Samuel
chance either? That could allow users to escape their cgroup settings as it can set up its own. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] QoS settings in sacctmgr requires restarting slurmctld to take effect

2019-01-10 Thread Chris Samuel
s firewall related and sometimes this is because slurmctld tells slurmdbd about an IP address that isn't reachable rather than one that is. Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] gres with docker problem

2019-01-07 Thread Chris Samuel
on exclusive nodes. That's correct - because parts of Docker (currently) run as root they can modify cgroups at will and apparently do. This is why things like Shifter, CharlieCloud and Singularity exist to let this happen on HPC systems more safely. All the best, Chris -- Chris Samuel

Re: [slurm-users] Fwd: Using srun ends ssh sessions

2019-01-07 Thread Chris Samuel
to be terminated when the job does end. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] salloc with bash scripts problem

2019-01-06 Thread Chris Samuel
arguments from the script for it to know what resources you are asking for. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] gres with docker problem

2019-01-06 Thread Chris Samuel
-smi isn't working in Docker because of a lack of device files, the problem is that it's seeing all 4 GPUs and thus is no longer being controlled by the device cgroup that Slurm is creating. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Fwd: Using srun ends ssh sessions

2019-01-06 Thread Chris Samuel
the session to the cgroup for the job and then will clean up that session when the job ends. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Chris Samuel
I've used before so I cannot vouch for how it works. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] gres with docker problem

2019-01-01 Thread Chris Samuel
, CharlieCloud and Singularity are used instead. I believe Docker are working on a "rootless" mode that might get around this, no idea where that's at though. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] gres with docker problem

2019-01-01 Thread Chris Samuel
On 1/1/19 5:25 pm, 허웅 wrote: what's the problem? Are you using cgroups to constrain access to GPUs? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] My quota with slurm does not change

2018-12-31 Thread Chris Samuel
isn't part of Slurm, so you'll need to contact the people who've created it to see how it works and why it's not doing what you expect. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] salloc with bash scripts problem

2018-12-30 Thread Chris Samuel
you a shell on the same node as you ran it on, with a job allocation that you can access by srun. You can read more about interactive shells here: https://slurm.schedmd.com/faq.html#prompt All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Help With Slurm 18.08 Installation on Ubuntu Server 18.04

2018-12-16 Thread Chris Samuel
by the various daemons when they are started up. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Regression with srun and task/affinity

2018-12-16 Thread Chris Samuel
of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm mysql 8.0

2018-12-14 Thread Chris Samuel
On Sat, December 15, 2018 5:59 am, Christopher Benjamin Coffey wrote: > Hi Guys, Hi Chris, > It appears that slurm currently doesn't support mysql 8.0. After upgrading > from 5.7 to 8.0 slurm commands that hit the db result in: > > sacct: error: slurmdbd: "Unknown error 1064" That's correct,

Re: [slurm-users] PrologFlags=Contain significantly changing job activity on compute nodes

2018-12-13 Thread Chris Samuel
related. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to implement job arrays with distribution=cyclic

2018-12-13 Thread Chris Samuel
then that should be possible using an overlapping partition restricted to them with LLN enabled. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-24 Thread Chris Samuel
ll finish. So ours is (amongst others): bf_window=23040,bf_resolution=600 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] About x11 support

2018-11-23 Thread Chris Samuel
it so unfortunately there's no reasoning for this given. commit e3140b7f8d96ced9dc85089caa65dd7c6be396fd Author: Tim Wickberg Date: Wed Sep 20 12:09:34 2017 -0600 Add new x11_util.c file to src/common. Utility functions for new x11 forwarding implementation. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] About x11 support

2018-11-23 Thread Chris Samuel
All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] new user; ExitCode reporting

2018-11-23 Thread Chris Samuel
- 1795583wrap FAILED 141:0 Hope that helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] $TMPDIR does not honor "TmpFS"

2018-11-22 Thread Chris Samuel
would you mind providing access to your prolog and epilog scripts? Attached! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC #!/bin/bash if [ "${SLURM_RESTART_COUNT}" == "" ]; then SLURM_RESTART_COUNT=0 fi JOBSCRATC

Re: [slurm-users] About x11 support

2018-11-22 Thread Chris Samuel
nodes of the cluster. I think it's good to hear from sites where this is the case because we can easily get stuck in our own little bubbles until something comes and trips us up like that. All the best! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote: > We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm > appears to be using backfill scheduling excessively. What are your SchedulerParameters ? All the beest, Chris -- Chris Samuel :

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
th CentOS 7.5. Haven't gone to 7.6 yet. One thing I just realised I'd not mentioned is that for this to work the user needs to be able to SSH from the compute node back into the login node without being prompted for any reason. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ :

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
1 support with SSH host based authentication including from compute nodes back into the login node (that's important)! Also you need to have configured your /etc/ssh/ssh_known_hosts files so the ssh client doesn't prompt to confirm host keys. All the best, Chris -- Chris Samuel : http://www.cs

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
t libssh2-devel" > Warning: untrusted X11 forwarding setup failed: xauth key data not generated That also looks like an error you should look into fixing first. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-20 Thread Chris Samuel
*). It's worked well for us at Swinburne (17.11.x and now 18.08.x) running with sssd and enumeration disabled. Not a vast number of users though! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] About x11 support

2018-11-16 Thread Chris Samuel
that mean everything is ok? > I wonder why the second command fails? Check your slurmd logs on the compute node. What errors are there? > >Another thing is we had to set: > > * X11Parameters=local_xauthority > > Where? sshd config file? No, that's in slurm.conf. All the

Re: [slurm-users] About x11 support

2018-11-15 Thread Chris Samuel
RSA keys. Extra info here: https://slurm.schedmd.com/faq.html#x11 You can (apparently) still use the external plugin if you build Slurm without its internal X11 support. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

<    1   2   3   4   >