Fixed in 15.08.2.
commit 30a5d6778fc86f8799cefc4fbea4f9ae7eac8d92
Author: Hongjia Cao
Date: Wed Oct 7 15:05:24 2015 +0200
Thanks for your contribution.
On 10/07/2015 12:15 PM, Hongjia Cao wrote:
attached.
--
Thanks,
/David/Bigagli
da...@schedmd.com
Can you show us the stack using gdb ?
Thanks
/David/Bigagli
da...@schedmd.com
===
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html
> On 08 Sep 2015, at 16:45, Mar
y to 4. The minimum index
value is 0. the maximum value is one less than the configura-
tion parameter MaxArraySize.
Thanks
/David/Bigagli
da...@schedmd.com
===
Slurm User Group Meeting, 15-16 September 2015,
Those pmi2 files are the server side of the pmi2 protocol implemented in
slurmstepd, those are always
installed. Is the client side that is the one that get’s installed from the
contribs directory.
Thanks
/David/Bigagli
da...@schedmd.com
/David/Bigagli
da...@schedmd.com
===
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html
> On 03 Sep 2015, at 00:02, Ulf Markwardt wrote:
>
> find an address, check slurm.conf
Is there a way to limit paging outside of Slurm? There are memory limits in
Slurm but no paging limit.
There is a backup controller in Slurm, you can read about it here:
http://slurm.schedmd.com/slurm.conf.html
Thanks
/David/Bigagli
da...@schedmd.com
available when they start.
Thanks
/David/Bigagli
da...@schedmd.com
===
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html
> On 30 Jul 2015, at 16:27, Yair Yarom wr
I think that Hydra should kept 0,1,2 open and dup them to /dev/null so that any
children’s
file descriptor will be greater than 2. This is standard Unix way.
Thanks
/David/Bigagli
da...@schedmd.com
===
Slurm User Group Meeting, 15-16
Committed to master branch. Thanks for your contribution.
commit 2678533ade11852b155126197430ed66b8c09f26
Author: Hongjia Cao
Date: Mon Jul 20 04:47:00 2015 -0700
Fix comparison of env-var option value in srun/sbatch/salloc
David Bigagli
da...@schedmd.com
module is not a command in /bin. To make it work you have to source the module
startup file in your script, for example:
. /usr/local/Modules/3.2.10/init/bash
then you can use the module file.
> On Jun 29, 2015, at 4:43 PM, Antonia Mey wrote:
>
> Dear all,
>
> I may have a very basic quest
has anything to do with any of the changes I made. Is this
a known problem with some kind of workaround, or should I file a bug on it?
Eric
--
Thanks,
/David/Bigagli
www.schedmd.com
uch means to me that job suspension (via scontrol suspend)
is allowed only to root and to SlurmUser
Is that intentional?
--
Thanks,
/David/Bigagli
www.schedmd.com
on running on all of your compute nodes, and
provided users can access the docker socket/port, they can submit jobs
that call "docker run", can't they?
Cheers,
--
Thanks,
/David/Bigagli
www.schedmd.com
I go and create a
new qos, all users instantly can utilize this qos. This is very strange,
I wonder if some setting has been munged in the database somewhere
mistakenly? Any ideas? Thanks.
Best,
Chris
--
Thanks,
/David/Bigagli
www.schedmd.com
since you brought it up.
--
Thanks,
/David/Bigagli
www.schedmd.com
em but to search for the
library name looks like a reasonable start to me.
I would hope you can help me with this.
Thanks,
Ulf
--
Thanks,
/David/Bigagli
www.schedmd.com
ible for the problem but to search for the
library name looks like a reasonable start to me.
I would hope you can help me with this.
Thanks,
Ulf
--
Thanks,
/David/Bigagli
www.schedmd.com
should fix both of these issues.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi
<mailto:janne.blomqv...@aalto.fi>
--
Thanks,
/David/Bigagli
www.schedmd.com
--
Thanks,
/David/Bigagli
www.schedmd.com
es, despite the file actually being there. I've
even set extended ACL's on the directory so that the SlurmUser can see
all of the files (sudo -u slurm ls -lR failed with permission denied).
Could anyone tell me why the slurm_script file cannot be read via prolog?
Thank you!
John DeSantis
on some form
of device I/O.
I know some people have reported strange interactions between Slurm
being on an NFSv4 mount (NFSv3 is fine).
Good luck!
Chris
--
Thanks,
/David/Bigagli
www.schedmd.com
The slurm.spec file decides if to install the init.d scripts or the
systemd stuff.
On 03/24/2015 07:24 PM, Fred Liu wrote:
-Original Message-
From: David Bigagli [mailto:da...@schedmd.com]
Sent: 星期三, 三月 25, 2015 1:19
To: slurm-dev
Subject: [slurm-dev] Re: successful systemd
all layouts are now unloaded.
Mar 24 22:22:46 cnlnx03 systemd[1]: Failed to start Slurm controller daemon.
-- Subject: Unit slurmctld.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
Thanks,
/David/Bigagli
www.schedmd.com
Yes it works with or without --unbuffered. I don't think data are
buffered inside of Slurm.
On 03/09/2015 10:15 AM, Lipari, Don wrote:
-Original Message-
From: David Bigagli [mailto:da...@schedmd.com]
Sent: Thursday, March 05, 2015 10:49 AM
To: slurm-dev
Subject: [slurm-de
deed
broken on Slurm 14.11. We just took our first cluster running 14.11 into
production this week, so probably not many users have run into this yet.
Regards,
Pär Lindfors, NSC
--
Thanks,
/David/Bigagli
www.schedmd.com
(after which it runs fine).
Problem number 2:
'scontrol show jobs' shows jobs in state RUNNING that don't actually
appear to exist. Some of these are days old. What might be going on here?
--
Jon Nelson
Dyn / Senior Software Engineer
p. +1 (603) 263-8029
--
Thanks,
/David/Bigagli
www.schedmd.com
where it couldn't resolve
user id's. So right after the job tried to launch it failed and
requeued. We just let the scheduler do what it will when it lists
Node_fail.
-Paul Edmon-
On 03/03/2015 01:20 PM, David Bigagli wrote:
How do you set your node down? If I run a job and
I'm just trying to figure out why it sent them into a held
state as opposed to just simply requeueing as normal. Thoughts?
-Paul Edmon-
On 03/03/2015 12:11 PM, David Bigagli wrote:
There are no default values for these parameters, you have to
configure your own. In your case do the prolog fa
by a
comma. These jobs are put in the *JOB_SPECIAL_EXIT* exit state.
Restarted jobs will have the environment variable
*SLURM_RESTART_COUNT* set to the number of times the job has been
restarted.
-Paul Edmon-
--
Thanks,
/David/Bigagli
www.schedmd.com
for us that have working cgroups memory limits.
Best regards,
Magnus Jonsson
--
Thanks,
/David/Bigagli
www.schedmd.com
Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=./test.sh
WorkDir=/home/adm17
StdErr=/home/adm17/test-e%j.txt <- here %j is not expanded
StdIn=/dev/null
StdOut=/home/adm17/test-o12032.txt
Regards,
Uwe
ght out by a typo on
http://slurm.schedmd.com/gres.html where the example has GresType=gpu,bandwith
rather than GresTypes=...
Could you please fix the doc!
BTW. Slurm was quite ungracious about having that bad entry in slurm.conf
Regards,
Gareth
--
Thanks,
/David/Bigagli
www.schedmd.com
what amount of time within a QOS.
sacct can give me information on an account level but I can't seem to
get it to report on a QOS level on a user by user bases.
Thanks
Jackie
--
Thanks,
/David/Bigagli
www.schedmd.com
partment for Research Computing, University of Oslo
--
Thanks,
/David/Bigagli
www.schedmd.com
ngly typed interface, and it would be nice if
I could use an interface with stronger types.
Cheers,
Walter Landry
--
Thanks,
/David/Bigagli
www.schedmd.com
I agree. Done in commit c8f34560c87cfbbf.
On 11/06/2014 06:46 PM, Christopher Samuel wrote:
On 07/11/14 11:53, David Bigagli wrote:
Hi,
Hiya David,
it used to logged at debug level in 2.6 and now it is an error. This
seems to be an issue with cgroups which does not allow that path
No I can't use srun directly as we get poor scaling, the next
thing in the list (after SC14) is to migrate to Open-MPI 1.8.4 which
is due out shortly which should address this.
cheers,
Chris
--
Thanks,
/David/Bigagli
www.schedmd.com
\
etc.33.1.3/topology.conf\
- etc.33.1.4/slurm.conf \
etc.33.1.4/testcases\
etc.33.1.4/topology.conf\
test34.1\
--
Thanks,
/David/Bigagli
www.schedmd.com
On 10/31/2014 12:49 PM, David Bigagli wrote:
The database is created by the slurmdbd daemon. Have you granted
access to the database to the slurm user?
Yes, I did a
grant all on slurm_acct_db.* TO 'slurm'@'localhost';
--
Thanks,
/David/Bigagli
www.schedmd.com
s.
How can I for a rebuild of the database? I have been grepping through
the source tree, but I haven't stumbled on the script that creates the
tables and columns needed.
~Charles~
--
Thanks,
/David/Bigagli
www.schedmd.com
##
Does anyone know the correct name and usage of this parameter?
Thank you.
Regards,
Uwe
--
--
Carles Fenoy
--
Thanks,
/David/Bigagli
www.schedmd.com
I think the question was about the submission node, the node where the
srun/sbatch was executed from.
On 10/10/2014 04:14 PM, Franco Broi wrote:
Are we talking about alloc_node? You can retrieve it using the perl api.
On 11 Oct 2014 06:53, David Bigagli wrote:
Hi, the information is in
process
resides given a known job_id/step_id?
Thanks,
Andrew
--
Thanks,
/David/Bigagli
www.schedmd.com
files
[2014-09-24T13:47:52.151] error: cannot find job_submit plugin for
job_submit/defaults
[2014-09-24T13:47:52.151] error: cannot create job_submit context for
job_submit/defaults
[2014-09-24T13:47:52.151] fatal: failed to initialize job_submit plugin
On Tue, 16 Sep 2014, David Bigagli wrote:
http://twitter.com/bull_de
** Bull Firmenprofil bei XING: https://www.xing.com/companies/bullgmbh
--
Thanks,
/David/Bigagli
www.schedmd.com
e impression from the
docu it was included in slurm. The slurm.conf line reads:
JobSubmitPlugins=default
in compliance with the documentation.
Thanks for your help
Eva
--
Thanks,
/David/Bigagli
Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
anks
Eva
--
Thanks,
/David/Bigagli
Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
Code committed to 14.03.8.
On 08/28/2014 05:24 AM, Hongjia Cao wrote:
The patch change the global eio_shutdown_time to a field in eio handle
to allow multiple eio handles in one process. This will be convenient
for a process to launch multiple job steps.
--
Thanks,
/David/Bigagli
Unfortunately the article refers to the memory sub system which gets
removed without problem. The issue happens on the freezer, however it is
just an error message without consequences.
On 08/13/2014 04:16 PM, Kilian Cavalotti wrote:
On Wed, Aug 13, 2014 at 10:00 AM, David Bigagli wrote
Interesting indeed. Let me have a look at it and experiment with it a bit.
On 08/13/2014 04:16 PM, Kilian Cavalotti wrote:
On Wed, Aug 13, 2014 at 10:00 AM, David Bigagli wrote:
For some reason at the first attempt rmdir(2) returns EBUSY.
Would writing to memory.force_empty before
same kernel.
Cheers,
--
Thanks,
/David/Bigagli
Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
sumption of
the batch step higher than either of the two job steps?
Thank you,
Robert
On 8/5/2014 1:41 PM, David Bigagli wrote:
Yes that is correct. The first entry is the allocation which has 2
cpus, -n 2 was specified, the second entry is the batch step that run
for 81 seconds, so the tota
navailable to anyone else, Slurm
is just showing that he effectively used 162 cpu seconds (81 x 2).
Thank you,
Robert
On 8/5/2014 9:53 AM, David Bigagli wrote:
Hi Robert,
the first line is the allocation and the second the batch
step, the batch step runs on one cpu. I am not sure what are
nc.
2880 Zanker Road
Suite 203
San Jose, CA 95134
Tel: +1 408 300 9448
Fax: +1 408 715 0102
www.BrightComputing.com <http://www.brightcomputing.com>
--
Thanks,
/David/Bigagli
Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
Julien Collas wrote:
Hi,
How would you do to run an interactive job array ? By interactive, I
mean that the command only exit at the end of the array ?
Regards,
Julien
--
Thanks,
/David/Bigagli
www.schedmd.com
= 0,
db_fail = 0xf0b2ff, db_resumed = 0xc2}
dir_name =
Remembering some previous problems I suspect that some uninitialised
variable in some structure (which represents some omitted option in
slurmd.conf) may cause such effect. Could someone please give me some
hints?
Thanks!
--
Thanks,
/David/Bigagli
www.schedmd.com
,
/David/Bigagli
www.schedmd.com
ED MESSAGE-
Hash: SHA1
On 21/05/14 05:56, David Bigagli wrote:
Yes, it is a change in behaviour. There was a fix in the I/O module
that unfortunately introduced this side effect.
Oh dear, that's going to be a fun bit of user re-education if we go to
14.x. Hopefully we can abstract it o
on its own and not within an salloc, is no longer supported and expected
to fail?
Thanks
Martins
On 5/20/14 1:23 PM, David Bigagli wrote:
In 14.03 you should use the SallocDefaultCommand as documented in
http://slurm.schedmd.com/slurm.conf.html
to srun with the --pty option.
On 05/19/2014 10
яков Артем Юрьевич
Best regards, Artem Y. Polyakov
--
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov
--
Thanks,
/David/Bigagli
www.schedmd.com
some debugging on and we are receiving task
exit from the tasks on the secondary node right after startup. Let me
know what other debugging output might be useful here.
Thanks,
Mike Robbert
--
Thanks,
/David/Bigagli
www.schedmd.com
WCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=1234
StoragePass=slurm_pass
StorageUser=slurm
StorageLoc=slurm_acct_db
--
Thanks,
/David/Bigagli
www.schedmd.com
Errata corrige. The core file is in the log directory.
On 04/11/2014 12:08 PM, David Bigagli wrote:
Hi,
this Slurm bug has been fixed and it will be available in 14.03.1
which will be released soon. Otherwise it is available in the HEAD.
You should find a core file of slurmstepd in the
tify_io_failure: aborting, io error with
slurmstepd on node 0
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
Launching with salloc/sbatch works.
- Anthony
--
Thanks,
/David/Bigagli
www.schedmd.com
cores on a node only.
LICENSE_ONLY
--
Thanks,
/David/Bigagli
www.schedmd.com
output. There are very few format chars left so I just
picked a free one.
Thanks
Martins
--
Thanks,
/David/Bigagli
www.schedmd.com
nks,
-J
--
Thanks,
/David/Bigagli
www.schedmd.com
quot;xmalloc usecs: %lld", delta_t);
START_TIMER;
for (i = 0; i < COUNT; i++) {
ptr = malloc(MAX_RANGES * sizeof(struct _range));
free(ptr);
}
END_TIMER;
info("malloc usecs: %lld", delta_t);
return 0;
}
multithread patch (commit
17449c066af69441b741110ef51fc2f534272871) does not help. Replacing
hostlist_push with hostlist_push_host (commit
1b0b135f9579e253ddd5bf680d2ea70ad12f9bda) fixes the problem of sinfo,
but I think the root cause is in xmalloc.
--
Thanks,
/David/Bigagli
www.schedmd.com
hostlist_push with hostlist_push_host (commit
1b0b135f9579e253ddd5bf680d2ea70ad12f9bda) fixes the problem of sinfo,
but I think the root cause is in xmalloc.
--
Thanks,
/David/Bigagli
www.schedmd.com
have received this message in error, please
notify us and remove it from your system and note that you must not copy,
distribute or take any action in reliance on it. Any unauthorized use or
disclosure of the contents of this message is not permitted and may be unlawful.
--
Thanks,
/David/Bigagli
www.schedmd.com
nks,
/David/Bigagli
www.schedmd.com
but, in fact, this is just an abbreviation and in the repository I see
"SLURM" is used (for example in README or in COPYING).
Thanks,
Taras
--
Thanks,
/David/Bigagli
www.schedmd.com
mat_access
#PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880
State=DOWN AllowGroups=lion.che_cluster_access
--
Thanks,
/David/Bigagli
www.schedmd.com
question about SLURM's libpmi. I am currently adopting DMTCP
project (checkpointer) to support SLURM. Currently I am working on PMI
support. And looking into _kvs_put function I have the following question:
for (i=0; i
--
Thanks,
/David/Bigagli
www.schedmd.com
s neater (or at least
more thorough), but kill is getting the job done. Is there a reason
that "slurm_kill_job" shouldn't be used?
Thanks
Michael
--
Thanks,
/David/Bigagli
www.schedmd.com
offline (-o), reset (-r), clear (-c), and -N (set
note/reason) command line options
* adds the -l (brief list) and -n (list with notes) command line options
* format the output in the default verbose list mode more like TORQUE's
pbsnodes does
--Troy
--
Thanks,
/David/Bi
? I want to make sure that all
the jobs in the array have completed prior to the job I want to run
runs. Would you reference the primary job ID? Or would you reference
the entire span of jobs namely JobID_[1-100]?
-Paul Edmon-
--
Thanks,
/David/Bigagli
www.schedmd.com
hanks,
/David/Bigagli
www.schedmd.com
voice: +1 415 320 2776
This is the link to the commit.
https://github.com/SchedMD/slurm/commit/6ef96d5aae739197e5512ea50ea55eef46f1975c
On 10/07/2013 10:26 AM, David Bigagli wrote:
Absolutely. It is fixed now.
On 10/07/2013 09:00 AM, Ralph Castain wrote:
Oops!! You put the fix in the wrong place, I'm a
er on
* back again to send out the right protocol
* message size.
*/
remaining_len -= PMII_COMMANDLEN_SIZE;
That change needed to go *before* the highlighted snprintf. I'm afraid 2.6.3
continue to segfault :-(
On Oct 3, 2013, at 12:12 PM, Ralph Castain wrote:
Cool - thanks David
Fixed. I chose the method proposed by Michael, subtract first, add
later. :-)
On 10/03/2013 10:29 AM, Ralph Castain wrote:
On Oct 3, 2013, at 10:16 AM, David Bigagli wrote:
I am not saying that remaining_len is correct or that mpich is bugless :-) I am
only saying that decrementing
ed to at
least change the snprintf command to reflect the reduced size of the "c" buffer.
On Oct 3, 2013, at 9:44 AM, David Bigagli wrote:
Hi,
I don't know the details of the segfault but the code in question is
correct. If you decrease the length then the file cmdlen:
cmdlen
Hi,
I don't know the details of the segfault but the code in question is
correct. If you decrease the length then the file cmdlen:
cmdlen = PMII_MAX_COMMAND_LEN - remaining_len;
will not be correct and wrong length will be sent to the pmi2 server.
This code is taken verbatim from mpich2-
Sounds good. :-) Thanks for the patch it is going to be in Slurm 2.6.3.
On 10/02/2013 12:28 AM, Mark Nelson wrote:
Hi All,
It does look like there is a bug in time_str2secs():
If we give it a time of format: days-0:min, we exit the for loop with
days set, but with min set to our hours value
Hi,
this issue has been fixed in the 2.6.2 release.
On 09/10/2013 09:01 AM, Michael Gutteridge wrote:
We allow jobs to overrun their wall time via "OverTimeLimit". We've
noticed that jobs that complete successfully but go over the wall time
are reported as having "JobState=TIMEOUT" in the
This link points to SLURM 2.3 documentation. For more updated versions
and the currently released version 2.6.1 you may want to use this
documentation:
http://slurm.schedmd.com/troubleshoot.html#nodes
On 08/26/2013 10:10 AM, Nikita Burtsev wrote:
https://computing.llnl.gov/linux/slurm/trou
Hello,
it is a requirement to specify --mpi=pmi2 otherwise the srun will
not load the pmi2 library
implementing the server side pmi2 functionalities.
There was a error in the contribs/pmi2/pmi2_api.c causing the 'no value
for req' message, this was the
->1488 remaining_len -= PMII_COM
ctually checks the licenses out during which interval an external user
> checks out licenses unbeknownst to the scheduler, but I suspect they have
> done nothing.
> If anyone hears of anything different, I, for one, would be happy to know.
>
> Gary D. Brown
>
> On Tue, Ju
Indeed currently there is no integration between Flexlm and SLURM, but some
ideas are being passed around what to do about it. I am one of the original
designers and developers of Platform License Scheduler.
The item 1) you mentioned is certainly the first step but consider even
that may not be ea
Hi,
the gstack command will show you the activities of each thread in the
slurmctld process.
This is an example:
david@prometeo ~>gstack 14432
Thread 8 (Thread 0x7fa9c9190700 (LWP 14433)):
#0 0x0035b90acb8d in nanosleep () from /lib64/libc.so.6
#1 0x0035b90aca00 in sleep () from /li
test
*/David*
The available slurm documentation can be found here:
http://slurm.schedmd.com
*/David*
On Thu, Apr 25, 2013 at 11:41 AM, David Bigagli wrote:
> Hi, slurm.conf has the following parameter as documented in the
> slurm.conf man page:
>
> max_job_bf=#
> The
Hi, slurm.conf has the following parameter as documented in the slurm.conf
man page:
max_job_bf=#
The maximum number of jobs to attempt backfill
scheduling for (i.e. the queue depth). Higher values
result in more overhead and less responsiveness.
Try to start it so it does not damonize I think the option is -D but better
check the man page, see if it core dumps.
sent from galaxy nexus
On Mar 27, 2013 9:31 AM, "Pablo Sanz Mercado" wrote:
>
>
> Hi Alejandro,
>
> Sorry, the messages we obtain about the "couldn't suspend job"
Is it possible the job runs on several nodes, say -N 3, then one node is
lost so it ends up running on 2 nodes only? Such a job should have been
submitted with ---no-kill.
/David
On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno wrote:
>
> Actually did mean node below. The job launched on
Hi, the problem of memory over-subscription is discusses in 'man
slurm.conf'. Have a look at
DefMemPerCPU, DefMemPerNode and the suggested configuration when
using CR_CPU_Memory.
*/David*
On Tue, Mar 12, 2013 at 3:15 PM, Joo-Kyung Kim wrote:
> Hi,
>
> ** **
>
> I am using SLURM 2.4.0.
This is the way select() works regardless of the version of redhat or any
other distribution.
The fd_set is a bit array defined in of __FD_SETSIZE which
is defined as 1024 in
*/David*
On Tue, Mar 12, 2013 at 11:30 AM, Hongjia Cao wrote:
> When launching tasks on about 1000 nodes, I get the f
Have you updated the slurmdb daemon first as described here:
http://schedmd.com/slurmdocs/quickstart_admin.html
*/David*
On Wed, Mar 6, 2013 at 5:58 PM, Lennart Karlsson
wrote:
>
> Hi,
>
> Today I upgraded SLURM from v 2.4.3 to v 2.5.3.
>
> It seems like a mistake, because slurmctld crashes. A
If the jobs run one after another slurm will pick the first host that can
run the job.
You could group your jobs running them as job steps and also have a look at
the
--distribution option of sbatch and srun.
1 - 100 of 127 matches
Mail list logo