Looks like the job ran. You should look at the output logs.
My guess:
The node the job ran on does not have access to that path.
Log on to that node and check it out.
Brian Andrus
On 3/3/2021 1:21 AM, Adrian Sevcenco wrote:
Hi! I just encountered the situation that i cannot submit jobs from
runaway:
sacctmgr show RunawayJobs
*From:* slurm-users on behalf
of Brian Andrus
*Sent:* Monday, March 1, 2021 11:14 AM
*To:* slurm-users@lists.schedmd.com
*Subject:* [slurm-users] fix missing accounting entries
All
All,
IIRC, there was a command that would repair the accounting tables when a
job had no endtime.
I can't seem to find the info for that. Does anyone recall such a thing?
Brian Andrus
Your prolog script is run by/as the same user as slurmd, so any
environment variables you set there will not be available to the job
being run.
See: https://slurm.schedmd.com/prolog_epilog.html for info.
Brian Andrus
On 2/12/2021 1:27 PM, mercan wrote:
Hi;
Prolog and TaskProlog
try:
export SLURM_OVERLAP=1
export SLURM_WHOLE=1
before your salloc and see if that helps. I have seen some mpi issues
that were resolved with that.
You can also try it using just the regular mpirun on the nodes
allocated. That will help with a datapoint as well.
Brian Andrus
On 2/4/2021
Did you compile slurm with mpi support?
Your mpi libraries should be the same as that version and they should be
available in the same locations for all nodes.
Also, ensure they are accessible (PATH, LD_LIBRARY_PATH, etc are set)
Brian Andrus
On 2/4/2021 1:20 PM, Andrej Prsa wrote:
Gentle
they can do a thing doesn't
mean they should do a thing.
There are many ways to achieve what is desired, most of which do not
require anyone other than the system admin.
If your issue can be solved without affecting others, leave them alone
and fix your issue.
Brian Andrus
it, slurm assumes all memory on the node for the job. So, even if you
are only using 1 cpu, all the memory is allocated, leaving none for any
other job to run on the unallocated cpus.
Brian Andrus
On 1/28/2021 2:15 PM, Chandler wrote:
Brian Andrus wrote on 1/28/21 13:59:
What
You are getting close :)
You can see why n010 is able to have multiple jobs. It shows more
resources available.
What are the specific requests for resources from a job?
Nodes, Cores, Memory, threads, etc?
Brian Andrus
On 1/28/2021 12:52 PM, Chandler wrote:
OK I'm getting this same output
Ahh.
One one of the new nodes do:
slurmd -C
The output of that will tell you what those settings should be. I
suspect they are off, which forces them into drain mode.
Brian Andrus
On 1/28/2021 12:25 PM, Chandler wrote:
Andy Riebs wrote on 1/28/21 07:53:
If the only changes to your system
Heh. Your nodes are drained.
do:
scontrol update state=resume nodename=n[011-013]
If they go back into a drained state, you need to look into why. That
will be in the slurmctld log. You can also see it with 'sinfo -R'
Brian Andrus
On 1/27/2021 10:18 PM, Chandler wrote:
Made a little bit
have been able to deploy completely to cloud using only
slurm. It has the ability to integrate into any cloud cli, so nothing
else has been needed. Just for the heck of it, I am thinking of
integrating it into Terraform, although not necessary.
Brian Andrus
On 1/26/2021 11:48 AM, Robert Kudyba
The net effect is that the environment gets setup the same as if the
user had opened a shell console.
Brian Andrus
On 1/26/2021 2:13 AM, Gestió Servidors wrote:
Hi,
My environment is this:
* Users are using “bash” as the default shell
* A sample of one of my environment modules
customers for Tim to keep things running as well
as he has. I'm pretty sure most folks that use slurm for any period of
time has received more value that a small support contract would be.
Brian Andrus
On 1/25/2021 7:35 AM, Jeffrey T Frey wrote:
...I would say having SLURM rpms in EPEL could be very
You would need to have a direct connect/vpn so the cloud nodes can
connect to your head node.
Brian Andrus
On 1/22/2021 10:37 AM, Sajesh Singh wrote:
We are looking at rolling out cloud bursting to our on-prem Slurm
cluster and I am wondering how to deal with the slurm.conf variable
We would need more information.
At a minimum, what client is it? As this is not a slurm issue, you would
need to dig into what is causing that behavior with your storage system.
Brian Andrus
On 1/20/2021 10:53 AM, John McCulloch wrote:
Our shared storage client daemon is utilizing 100
mean their child can :)
Brian Andrus
On 1/15/2021 6:38 AM, Durai Arasan wrote:
Hi,
As you know for each partition you can specify
AllowAccounts=account1,account2...
I have a parent account say "parent1" with two child accounts "child1"
and "child2"
I expected that
over a direct-connect or VPN.
Brian Andrus
On 12/15/2020 12:02 PM, Sajesh Singh wrote:
We are currently investigating the use of the cloud scheduling
features within an on-site Slurm installation and was wondering if
anyone had any experiences that they wish to share of trying to use
Check your hosts file and ensure 'localhost' does not have an IPV6
address associated with it.
Brian Andrus
On 12/14/2020 4:19 PM, Alpha Experiment wrote:
Hi,
I am trying to run slurm on Fedora 33. Upon boot the slurmd daemon is
running correctly; however the slurmctld daemon always errors
That package looks to be built for a system with an nvidia gpu installed.
Look for (or build) different packages if you are not going to use a
gpu-based node.
Brian Andrus
On 12/4/2020 11:32 AM, Mullen, Drew wrote:
Howdy
Im getting this error installing slurm 20.02.4:
Error: Package
in a completed state for a period of time,
but they are not showing up at all on our cluster.
How does one have jobs show up that are completed?
Brian Andrus
to more
fetches, wasting effort.
This is a VERY simplistic description, but the point is that
hyperthreading is not a silver bullet that will improve HPC performance
if you are maximizing your resource utilization.
Ok, I will get off my soapbox :)
Brian Andrus
On 11/4/2020 7:30 AM, Jean
packages. Source control for me
is just that spec file.
Brian Andrus
On 10/20/2020 8:46 AM, Michael Jennings wrote:
On Tuesday, 20 October 2020, at 15:49:25 (+0800),
Kevin Buckley wrote:
On 2020/10/20 11:50, Christopher Samuel wrote:
I forgot I do have access to a SLES15 SP1 system, that has
do you have your gres.conf on the nodes also?
Brian Andrus
On 10/8/2020 11:57 AM, Sajesh Singh wrote:
Slurm 18.08
CentOS 7.7.1908
I have 2 M500 GPUs in a compute node which is defined in the
slurm.conf and gres.conf of the cluster, but if I launch a job
requesting GPUs the environment
een places where that can take 24 hours.
Brian Andrus
On 9/29/2020 6:18 AM, Diego Zuccato wrote:
Hello all.
One of the users is unable to submit jobs to our cluster.
The first time he tries, he gets
$ sbatch test.job
sbatch: fatal: Invalid user id: 621049927
then:
$ sbatch test.job
sbatch: er
on the node waiting to be
resumed, but the node resources may get assigned to other jobs while
they wait to resume.
Brian Andrus
On 9/22/2020 2:33 PM, Ransom, Geoffrey M. wrote:
Hello
We had a user post a large number of array jobs with a short actual
run time (20-80 seconds, but mostly
both. I do high debug to the journal and info to the log file.
Brian Andrus
On 9/8/2020 2:41 AM, Gestió Servidors wrote:
Hello,
I don’t know why, but my SLURM server (that is running fine) has its
slurmdctl.log file with size 0 bytes... so... where is writting logs?
It seems that log file has
That is where you have it call a bash script and within the script you
do as needed.
Like Ahmet's suggested script.
So use his as a template and add the headers you desire.
Brian Andrus
On 8/28/2020 11:36 AM, Chris Samuel wrote:
On 8/27/20 3:42 pm, Brian Andrus wrote:
Actually, you can add
to
schedule them in that fashion outweighs the resources needed by far.
Brian Andrus
On 8/28/2020 3:30 AM, navin srivastava wrote:
Hi Team,
facing one issue. several users submitting 2 job in a single batch
job which is very short jobs( says 1-2 sec). so while submitting more
job slurmctld
Actually, you can add headers of all kinds:
Quick search of "sendmail add headers" discovers:
https://serverfault.com/questions/347602/sending-e-mail-from-sendmail-with-headers
Brian Andrus
On 8/26/2020 10:02 PM, Andrew Elwell wrote:
Hi folks,
I'm getting fed up receiving out
IIRC, that is because it is trying to do the 'configless' feature of
slurm 20 where it uses DNS entries to find the config.
This will happen if /etc/slurm.conf does not exist on the node.
Check that you have that and that it is the same as the one on the master.
Brian Andrus
On 8/24/2020 7
they will wait a relatively shorter amount of time. There are
numerous other factors you can use. If you have accounting and
associations configured, you can manipulate it all the way to the
association and qos.
Brian Andrus
On 8/17/2020 11:23 PM, Gerhard Strangar wrote:
Brian Andrus wrote:
Most likely, b
, the devil is in the details on how to
define/get what you want.
Brian Andrus
On 8/17/2020 10:13 AM, Gerhard Strangar wrote:
Hello,
I'm wondering if it's possible to have slurm 19 run two partitions (low
and high prio) that share all the nodes and limit the high prio
partition in number of nodes
This is very likely by design of the cluster and/or network. Otherwise
users could use the cluster to mine bitcoin and such.
Brian Andrus
On 8/2/2020 7:11 AM, Mahmood Naderan wrote:
I thought that maybe srun doesn't transfer all settings from the head
node to the compute node.
The wget
lua, if I may ask?
Brian Andrus
On 7/27/2020 9:52 AM, Baer, Troy wrote:
There's an outstanding feature request for that:
https://bugs.schedmd.com/show_bug.cgi?id=8383
While waiting on that, we've taken to injecting it into the job's environment
ourselves in the Lua submit filter.
--Troy
calls as too many of them can tip a system over.
Brian Andrus
Is there a reason to run them as a single job?
It may be easier to just have 2 separate jobs of 16 cores each.
If there are dependency requirements, that is addressed by adding any
dependencies to the job submission.
Brian Andrus
On 7/25/2020 2:50 AM, Даниил Вахрамеев wrote:
Hi everyone
root could be quite useful. Especially for service accounts.
Yes, there can be a workaround using sudo, but it seems better if we
could track things in slurm to know a job was run 'on behalf of' another
user.
Thoughts, suggestions, current approaches?
Thanks,
Brian Andrus
slurm daemons
going down.
Brian Andrus
On 7/21/2020 7:44 AM, Peter Mayes wrote:
Hi,
My first post to the list, so apologies if this is a FAQ,
My configuration has two nodes allocated for Slurm masters, with a
highly-available NFS server mounting a filesystem across the two nodes.
I need advice
Ah,
They are assuming you are running the web interface as root.
If your environment is secure enough, you can do that. Or, grant your
web server user privileges in slurm to be allowed to use the "--uid" option.
Brian Andrus
On 7/20/2020 8:39 AM, Sidhu, Khushwant wrote:
H
You are trying to use sbatch with the "--uid" option which is only
allowed by root.
Either run sbatch as the user doing the request (which should be the
same user that is running rstudio) or use 'sudo -u ' to run sbatch.
Brian Andrus
On 7/20/2020 7:50 AM, Sidhu, Khushwant wrote:
, the partition is
used to determine which node(s) and filter/order jobs. You should add
the node to the new partition, but also leave it in the 'test'
partition. If you are looking to remove the 'test' partition, set it to
down and once all the running jobs that are in it finish, then remove it.
Brian
you set that in the slurm.conf to continue the numbering from where
you left off so there are no entries in accounting that get replaced.
Brian Andrus
On 7/8/2020 3:15 AM, Simon Kainz wrote:
Hello,
we have a long-running slurm cluster, accounting into slurmdbd/mysql
backend on the cluster
thentication
<https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Host-based_Authentication>because
/*normal users have no business on those servers!*/
Brian Andrus
On 6/17/2020 1:26 AM, Ole Holm Nielsen wrote:
On 6/9/20 5:45 PM, Michael Jennings wrote:
On Tuesday, 09 June 2020, at 12:43:34
them outside the cluster.
Brian Andrus
On 6/19/2020 5:04 AM, David Baker wrote:
Hello,
We are currently helping a research group to set up their own Slurm
cluster. They have asked a very interesting question about Slurm and
file systems. That is, they are posing the question -- do you need
Sounds like a race condition where slurmd is starting before the node is
truly ready.
You can try adding dependencies for slurmd so it will not start until
some other needed service is running.
The benefits of systemd :)
Brian Andrus
On 6/9/2020 10:53 AM, Dumont, Joey wrote:
Hi,
I
/configless_slurm.html
Brian Andrus
are
running.
slurmd should be running as root. It needs to be able to do a few things
including run the job as the user that submitted it. Things that only
root should be doing.
Brian Andrus
On 6/2/2020 2:00 PM, Ferran Planas Padros wrote:
Hi Ole,
I run the same version of slurm in all
Heh. That is the on-going "user education"
You could change the amount of ram requested using a job_sumit lua
script, but that could bite those that are accurate with their requests.
Or set a max ram for the partition.
Brian Andrus
On 5/27/2020 3:46 PM, Marcelo Z. Silva wrote:
Maybe too obvious, but have you checked your .bashrc, .bash_profile and
such?
Brian Andrus
On 5/12/2020 10:27 AM, Ellestad, Erik wrote:
Which SLURM prolog specifically?
I’m not finding that to work for me in either task-prolog or prolog.
SLURM_TMPDIR and TMPDIR are still both set to /tmp
For CentOS/RHEL, it is in the OpenFusion repo:
http://repo.openfusion.net/centos7-x86_64/
just
yum install
http://repo.openfusion.net/centos7-x86_64/openfusion-release-0.7-1.of.el7.noarch.rpm
then
yum install libjwt-devel
Brian Andrus
On 4/18/2020 2:27 PM, Daniel Letai wrote
the
next uid on any node.
The error below looks like you may have a different uid for the slurm
user on the node. What uid is slurmd running as on the bad node vs a
good node?
Brian Andrus
On 4/17/2020 2:38 PM, Dean Schulze wrote:
Just noticed this. On the problem node the munged.log file
. It could probably be worked around, but
not in a simple way. Easier to upgrade to the newest release :)
Brian Andrus
On 3/9/2020 10:14 AM, MrBr @ GMail wrote:
Hi Brian
The nodes work with slurm without any issues till I try the "--reboot"
option.
I can successfully allocate the no
normal users cannot use "--reboot"
Brian Andrus
On 3/9/2020 10:14 AM, MrBr @ GMail wrote:
Hi Brian
The nodes work with slurm without any issues till I try the "--reboot"
option.
I can successfully allocate the nodes or any other slurm related operation
> You may want to dou
' from the node and verify it is
able to talk to slurmctld from the node and verify slurmd started
successfully.
Brian Andrus
On 3/9/2020 4:38 AM, MrBr @ GMail wrote:
Hi all
I'm trying to use the --reboot option of srun to reboot the nodes
before allocation.
However the nodes not been
on that are.
Brian Andrus
I would say so.
Certainly, if you have many nodes and/or many jobs being submitted, you
will see an impact, but in my experience comparing Slurm to SGE, Slurm
has much less overhead to cause as much impact.
Brian Andrus
On 2/26/2020 1:05 PM, Joshua Baker-LePain wrote:
On Wed, 26 Feb 2020
easy to do. Just add the lines to your slurm.conf for the
backup controller, start it up and reconfigure for all running nodes to
be aware of it.
Brian Andrus
On 2/26/2020 12:48 PM, Joshua Baker-LePain wrote:
We're planning the migration of our moderately sized cluster (~400
nodes, 40K jobs
Bright is not needed... for much of anything...
On 2/25/2020 12:48 PM, Robert Kudyba wrote:
I suppose I can ask Bright Computing but does anyone know what version
of Bright is needed? I would guess 8.2 or 9.0. Definitely want to dive
into this.
Usually means you updated the slurm.conf but have not done "scontrol
reconfigure" yet.
Brian Andrus
On 2/10/2020 8:55 AM, Robert Kudyba wrote:
We are using Bright Cluster 8.1 with and just upgraded to slurm-17.11.12.
We're getting the below errors when I restart the slurmct
Your trying to run bash which, without special configuration, needs a pty
Try
srun -v -p debug --pty bash
Brian Andrus
On 2/6/2020 10:28 PM, Hector Yuen wrote:
Hello,
I am setting up a very simple configuration: one node running slurmd
and another one running slurmctld.
In the slurmctld
Check the slurmd log file on the node.
Ensure slurmd is still running. Sounds possible that OOM Killer or such
may be killing slurmd
Brian Andrus
On 1/20/2020 1:12 PM, Dean Schulze wrote:
If I restart slurmd the asterisk goes away. Then I can run the job
once and the asterisk is back
ster generically, so
their configs are not getting matched to the specific info in your main
config
Brian Andrus
On 1/20/2020 10:37 AM, Robert Kudyba wrote:
I've posted about this previously here
<https://groups.google.com/forum/#!searchin/slurm-users/kudyba%7Csort:date/slurm-users/mMECjerUmFE/V
I think we would need to see your SuspendScript to get a better idea of
what is happening.
That error indicates the nodes are likely not running slurmd and the
control daemon things they are still up.
What is the output of 'sinfo -R'?
Brian Andrus
On 1/7/2020 3:42 AM, Steve Brasier wrote
depends on what best suits the specific needs.
Brian Andrus
On 12/16/2019 2:29 PM, Ransom, Geoffrey M. wrote:
Hello
I am looking into switching from Univa (sge) to slurm and am
figuring out how to implement some of our usage policy in slurm.
We have a Univa queue which uses job classes
You prompted me to dig even deeper into my epilog. I was trying to
access a semaphore file in the user's home directory.
It seems that when the epilogue is run the ~ is not expanded in anyway.
So I can't even use ~${SLURM_JOB_USER} to access their semaphore file.
Potentially problematic for
a cleanup script run on jobs that
have timed out?
Brian Andrus
So it seems nss_slurm does not play well with sudo.
If I connect to a box that uses it and try to use sudo, I get:
*sudo: PAM account management error: Authentication service cannot retrieve
authentication info*
Has anyone else seen this?
Is there a workaround?
Brian Andrus
crickets. I think in our case we were not able to ensure that the
epilog always ran for different types of job failures, so we just had
the users add some more cleanup code to the end of their jobs _and_
also run separate cleanup jobs.
Regards,
Alex
On Wed, Dec 4, 2019 at 7:29 PM Brian Andrus
s have had the same issue and even add to comments in the
bugs, but no responses/resolution for this have been posted.
FWIW, I also see the issue with the latest slurm 20.05 pre1 code.
Brian Andrus
On 12/5/2019 11:46 PM, von St. Vieth, Benedikt wrote:
Hi again,
I answered this question on Oct 2
Tim claims it works...
I have compiled it, but when you try to run slurmd, it throws some
errors and will not start. From a previous thread:
While I can successfully build/run slurmctld, slurmd is failing because ALL
of the SelectType libraries are missing symbols.
Example from
Quick question:
Is the epilogue script run if a job exceeds its time limits and is being
canceled?
What about just cancelled?
I need to be able to clean up some job-specific files regardless of how
the job ends and I'm not sure epilogue is sufficient.
Brian Andrus
server you use.
The best solution, of course, is to educate the users.
You could create a job_submit plugin that removes mail options for
arrays, but you may negatively impact users that do need that.
Brian Andrus
On 11/25/2019 10:55 PM, ichebo...@univ.haifa.ac.il wrote:
I meant on the admin
FAIL apply to a job array
as a whole rather than generating individual email messages for each
task in the job array./
Brian Andrus
On 11/25/2019 1:48 AM, ichebo...@univ.haifa.ac.il wrote:
Hi,
I would like to ask if there is some options to configure the e-mail
notification of slurm job
/openmpi), which forces only one
version to be able to be loaded. I also set paths so specific versions
of libraries become available depending on what environment you select
(gcc vs intel for example).
Is there something besides versioning that lmod shines at?
Brian Andrus
On 11/24/2019 12:48 AM
, I get back 41 groups I am in.
Bug?
Brian Andrus
t actually
sharing homes could be the cause.
Brian Andrus
On 11/17/2019 11:24 AM, Yann Bouteiller wrote:
Hello,
I am trying to do this on computecanada, which is managed by slurm:
https://ray.readthedocs.io/en/latest/deploying-on-slurm.html
However, on computecanada, you cannot inst
You are trying to specifically run on node cn110, so you may want to
check that out with sinfo
A quick "sinfo -R" can list any down machines and the reasons.
Brian Andrus
On 11/10/2019 11:23 PM, Sukman wrote:
Hi Brian,
I see. Thank you for your suggestion.
I definitely will try i
Brian Andrus <mailto:toomuc...@gmail.com>> wrote:
Are you specifying memory for each of the jobs?
Can't run a small job if there isn't enough memory available for it.
Brian Andrus
On 11/1/2019 7:42 AM, c b wrote:
I have:
SelectType=select/cons_res
SelectTypeP
Are you specifying memory for each of the jobs?
Can't run a small job if there isn't enough memory available for it.
Brian Andrus
On 11/1/2019 7:42 AM, c b wrote:
I have:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn <mailt
Except sstat can give you the MaxRSS without having cgroups and it will
give you a simple MaxRSS, whereas sacct provides a MaxRSS for every
step... have to play with that data to get the high water mark grrr.
I had tried to use sstat in an epilogue but apparently that is too late...
Brian
ckages except pmix-devel. Haven't figured that one yet.
Brian Andrus
On 10/30/2019 11:18 AM, Christopher Benjamin Coffey wrote:
Yes, I'd be interested too.
Best,
Chris
I prefer building packages.
I did have to extract and change the .spec file to accommodate some of
the changes as well as set up the environment to complete.
Brian
On 10/29/2019 8:11 AM, Christopher Benjamin Coffey wrote:
Brian, I've actually just started attempting to build slurm 19 on
/libslurmfull.so|grep powercap_*//*
*//*0010f7b8 T slurm_free_powercap_info_msg*//*
*//*00060060 T slurm_print_powercap_info_msg*/
So, sure enough powercap_get_cluster_current_cap is not in there.
Methinks the linking needs examined.
Brian Andrus
On 10/28/2019 2:32 AM, Benjamin Redling
-1.el8.x86_64.rpm
slurm-slurmdbd-19.05.3-1.el8.x86_64.rpm
slurm-torque-19.05.3-1.el8.x86_64.rpm
Brian Andrus
On 10/28/2019 2:32 AM, Benjamin Redling wrote:
On 28/10/2019 08.26, Bjørn-Helge Mevik wrote:
Taras Shapovalov writes:
Do I understand correctly that Slurm19 is not compatible
IIRC, the big difference is if you want to use cgroups on the nodes. You
must use the cgroup plugin.
Brian Andrus
On 10/24/2019 3:54 PM, Christopher Benjamin Coffey wrote:
Hi Juergen,
From what I see so far, there is nothing missing from the jobacct_gather/linux
plugin vs the cgroup
.
Brian Andrus
On 10/18/2019 1:03 PM, bbenede...@goodyear.com wrote:
Greetings!
I am trying to set up a partition that will only allow one job at a time to
run, regardless of who submits it.
So multiple jobs from multiple users can be in the queue. But I only want the
partition to run one
:34 2019-10-01T00:00:44 00:00:10
Brian Andrus
tun Peksel*
oytun.pek...@semcon.com <mailto:oytun.pek...@semcon.com>
+46739205917
*From:*slurm-users *On Behalf
Of *Brian Andrus
*Sent:* den 15 oktober 2019 20:58
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] Execute scripts on su
handling
until they have it as part of their app.
Brian Andrus
On 10/14/2019 4:40 AM, Oytun Peksel wrote:
It is quite weird if slurm has no mechanism as described. I have been
digging more into it and someone suggested a workaround using mail
notifications. You use a script instead of the mail
that are idle~ but no calls to the script.
If I restart slurmctld, the backlog starts running and things work.
Any ideas what could cause this?
Brian Andrus
Lyn,
That was it, thanks!
sacct -o reserved
Brian
On 9/21/2019 9:26 AM, Lyn Gerner wrote:
Hey Brian,
I think the discussion was in the context of suspend/resume,
and it was the Reserved value that effectively represents that time.
Regards,
Lyn
On Sat, Sep 21, 2019 at 9:15 AM Brian Andrus
There was a command shared at the SLUG that showed how long it took a
node to go from a power_down (idle~) state to up and having a job
running on it, but I cannot remember what it was.
Does anyone recall that?
Brian Andrus
=18446744073709551614,4=1,5=4 |
++-++
Brian Andrus
On Mon, Sep 16, 2019 at 2:58 PM Brian Andrus wrote:
> I have
> JobAcctGatherType = jobacct_gather/linux
>
> Brian
>
> On Mon, Sep 16, 2019 at 12:40 PM Antony Cleave
is used to collect accounting information. Supported values are
> *jobacct_gather/linux* (recommended), *jobacct_gather/cgroup* and
> *jobacct_gather/none* (no information collected).
>
> Antony
>
>
> On Mon, 16 Sep 2019, 14:07 Brian Andrus, wrote:
>
>> Yep, the ma
, Christopher Samuel wrote:
On 9/15/19 4:17 PM, Brian Andrus wrote:
Are steps required to capture Max RSS?
No, you should see a MaxRSS reported for the batch step, for instance:
$ sacct -j $JOBID -o jobid,jobname,maxrss
All the best,
Chris
The jobs have definitely completed when I try to gather the info.
Brian
On 9/15/2019 4:01 PM, Steven Dick wrote:
I don't think it shows up until the job completes.
On Sat, Sep 14, 2019 at 2:25 AM Brian Andrus wrote:
Quick question?
When I use sacct to show job stats, it always has a blank
Hmm. We are only using allocations and have slurm.conf configured with:
AccountingStorageEnforce=associations,nosteps
Are steps required to capture Max RSS?
Brian
On 9/15/2019 1:48 PM, Mark Hahn wrote:
When I use sacct to show job stats, it always has a blank entry for
the MaxRSS field. Is
Quick question?
When I use sacct to show job stats, it always has a blank entry for the
MaxRSS field. Is there something that needs enabled to get that in?
I do see it if I use sstat while the job is running.
Brian Andrus
.
However, there are definite use cases that make it worthwhile.
So long as you allocate enough resources for the node (be it the
controller or other) you will be fine.
Brian Andrus
On 9/12/2019 7:23 AM, Jose A wrote:
Dear all,
In the expansion of our Cluster we are considering to install SLURM
201 - 300 of 346 matches
Mail list logo