[slurm-dev] slurmctld causes slurmdbd to seg fault

2017-10-17 Thread Loris Bennett
871270, ...}) = 0 write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132 +++ killed by SIGSEGV (core dumped) +++ We're running 17.02.7. Any ideas? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: file and directory permissions

2017-10-10 Thread Loris Bennett
nd the statesave location. > If anyone has an advice or would like to tell me how it was solved on your > site, > I would be very happy. > > > best > Marcus Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Loris Bennett
g. In your case, only the database is worth backing up, and even then, that's only really interesting if you need the old data for statistical purposes, or you need to maintain, say, fairshare information across the upgrade. In bocca al lupo! Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Upgrading Slurm

2017-10-03 Thread Loris Bennett
x27;t be any such jobs. In this case, there shouldn't in theory be a problem - although I must admit that I wouldn't be that surprised if converting the database from 2.3.4 to, say, 17.02.7 didn't go 100% smoothly. However, Debian users who just rely on Debian packages are al

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-26 Thread Loris Bennett
"Have you tried turning it off and then on again?" is still often a valid suggestion. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Accounting using LDAP ?

2017-09-20 Thread Loris Bennett
Hi Chris, Christopher Samuel writes: > On 20/09/17 15:53, Loris Bennett wrote: > >> Having said that, the only scenario I can see being easily automated is >> one where each user only has one association, namely with their Unix >> group, and everyone has equal shares.

[slurm-dev] Re: Accounting using LDAP ?

2017-09-19 Thread Loris Bennett
g easily automated is one where each user only has one association, namely with their Unix group, and everyone has equal shares. This is our set up, but as soon as you have, say, users with multiple associations and/or membership in some associations confers more shares automation becomes very

[slurm-dev] Does powering down as suspend action still work?

2017-09-19 Thread Loris Bennett
600) && (susp_total > 0)) { info("Power save mode: %d nodes", susp_total); I assume that shown line can no longer appear. -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Suspend stopped working - debug flag?

2017-09-19 Thread Loris Bennett
power used by nodes? If so, what debug flags should I be using? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-19 Thread Loris Bennett
Loris Bennett writes: > Hi, > > I have a node which is powered on and to which I have sent a job. The > output of sinfo is > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > test up 7-00:00:00 1 mix~ node001 > > The output of squeue

[slurm-dev] Re: Behaviour of Partition setting MaxTime

2017-09-18 Thread Loris Bennett
its run-time beyond the 'MaxTime' of the partition. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-12 Thread Loris Bennett
to act directly upon the > job. However, if it's possible to down the node, that should requeue (or > cancel) the job. > > Best, > Lyn > > On Tue, Sep 12, 2017 at 3:40 AM, Loris Bennett > wrote: > > Hi, > > I have a node which is powered on and to wh

[slurm-dev] Job stuck in CONFIGURING, node is 'mix~'

2017-09-12 Thread Loris Bennett
in a power-saving state, which in our case is powered-off. This problem may have existed in 16.05.10-2, but currently we are using 17.02.7. All other nodes in the cluster apart from one are functioning normally. Does anyone have any idea what we might be doing wrong? Cheers, Loris -- Dr.

[slurm-dev] Change in meaning of --nodelist

2017-07-26 Thread Loris Bennett
reintroduce the old functionality, say, with a option '--include'? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] sreport job sizesbyaccount over all accounts?

2017-07-25 Thread Loris Bennett
lues for other time periods). I'm going to read the data into R, so I can do the roll-up there, but I wondered whether I can get the information directly from Slurm. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Elapsed time for slurm job

2017-07-24 Thread Loris Bennett
orage is disabled > > How to solve this problem? > > Regards, Sema. > > On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett > wrote: > > Sema Atasever writes: > > > Elapsed time for slurm job > > > > Dear Friends, > > > > How can i ret

[slurm-dev] Re: Elapsed time for slurm job

2017-07-24 Thread Loris Bennett
Sema Atasever writes: > Elapsed time for slurm job > > Dear Friends, > > How can i retrieve elapsed time if the slurm job has completed? > > Thanks in advance. sacct -o jobid,elapsed See 'man sacct' or 'sacct -e' for the full list of fields. Cheers,

[slurm-dev] Re: ANNOUNCE: A collection of Slurm tools

2017-07-21 Thread Loris Bennett
e a common config file, where things such as paths to binaries, USERLIST and username lengths are defined. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: srun can't use variables in a batch script after upgrade

2017-07-10 Thread Loris Bennett
Hi Dennis, Dennis Tants writes: > Hello Loris, > > Am 10.07.2017 um 07:39 schrieb Loris Bennett: >> Hi Dennis, >> >> Dennis Tants writes: >> >>> Hi list, >>> >>> I am a little bit lost right now and would appreciate your help. >&

[slurm-dev] Re: slurm 17.2.06 min memory problem

2017-07-09 Thread Loris Bennett
anks, > Roy What value of SelectType are you using? Note also that CR_LLN schedules jobs to the least loaded nodes and so until all nodes have one job, you will not more than one job per node. See 'man slurm.conf'. Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: srun can't use variables in a batch script after upgrade

2017-07-09 Thread Loris Bennett
recommend better configurations I would > glady hear them. > Should you need any more information I will provide them. > Thank you for your time! Shouldn't the variable be $SBATCH_JOB_NAME? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Length of possible SlurmDBD without HA

2017-07-06 Thread Loris Bennett
server running slurmctld, the number of jobs, and the amount of memory required per job. So roughly how much memory will be required per job? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm query

2017-07-04 Thread Loris Bennett
MIT NODES STATE NODELIST > > debug* up infinite 1 idle punehpcdl01 It means that 'debug' is the default partition. See 'man sinfo'. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Multifactor Priority Plugin for Small clusters

2017-07-04 Thread Loris Bennett
oris > Thanks and Regards > Sourabh > > Regards, > Sourabh Shinde > +49 176 4569 5546 > sourabhshinde.cf > > On Mon, Jul 3, 2017 at 8:02 AM, Loris Bennett > wrote: > > Hi Sourabh, > > sourabh shinde writes: > > > Multifactor Priority Plugin

[slurm-dev] sacct: --unit applied to NNodes

2017-07-04 Thread Loris Bennett
16:52:49 1699682.0 0.00G 34Gc 16:52:48 Has this been fixed in more recent versions? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Rewarding good memory requirement estimation on shared nodes?

2017-07-03 Thread Loris Bennett
component to the multifactor priority plugin which would do the same thing. Is there any way to do this, short of writing one's own version? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Multifactor Priority Plugin for Small clusters

2017-07-02 Thread Loris Bennett
out to correspond to how you have configured the shares. If you only have a small amount of resources and a small number of users, this may not work very well. Have you looked at Gang scheduling without premption? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Dry run upgrade procedure for the slurmdbd database

2017-06-26 Thread Loris Bennett
nversion has been 100% completed? > > Question 2: Can anyone confirm that the output "slurmdbd: debug2: Everything > rolled up" indeed signifies that conversion is complete? > > Thanks, > Ole > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Loris Bennett
Michael Jennings writes: > On Thursday, 22 June 2017, at 04:19:04 (-0600), > Loris Bennett wrote: > >> rpmbuild --rebuild --with=slurm --without=torque pdsh-2.26-4.el6.src.rpm > > Remove the equals signs. I have no problems building pdsh 2.29 via: > > rpmb

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Loris Bennett
build' line rpmbuild --rebuild --with=slurm --without=torque pdsh-2.26-4.el6.src.rpm fails with --with=slurm: unknown option The page https://github.com/grondo/pdsh implies it should be rpmbuild --rebuild --with-slurm --without-torque pdsh-2.26-4.el6.src.rpm but this also fails:

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Loris Bennett
Hi, Kent Engström writes: > "Loris Bennett" writes: >> Hi, >> >> I can generate a list of node lists on which the jobs of a given user >> are running with the following: >> >> $ squeue -u user123 -h -o "%N" >> n

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Loris Bennett
ted > > should also do it... Slightly better to remember ;) > > On Thu, Jun 22, 2017 at 02:59:11AM -0600, Loris Bennett wrote: >> >> Hi, >> >> I can generate a list of node lists on which the jobs of a given user >> are running with the following: >

[slurm-dev] Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Loris Bennett
tion to allow the delimiter in the output of 'scontrol show hostname' to be changed from an newline to, say, a comma? That would permit easier manipulation of node lists without one having to google the appropiate sed magic. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Uni

[slurm-dev] Re: ExitCode 139

2017-06-21 Thread Loris Bennett
program doing something wrong, such as trying to write beyond the bounds of an array. This is probably unrelated, but the value of --cpus-per-task is quite high. Do the nodes have 20 CPUs each? Cheers, Loris > On 21 June 2017 at 05:45, Loris Bennett wrote: > > Hi Djibril, > >

[slurm-dev] Re: ExitCode 139

2017-06-20 Thread Loris Bennett
that it indicates that your program experienced a segmentation fault. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02

2017-06-20 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > On 06/20/2017 04:32 PM, Loris Bennett wrote: >> We do our upgrades while full production is up and running. We just stop >> the Slurm daemons, dump the database and copy the statesave directory >> just in case. We then do the update, an

[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02

2017-06-20 Thread Loris Bennett
more specific DB optimization tricks could be > done, but I'm not a DB admin so I won't venture to say. > > -Paul Edmon- > > On 06/20/2017 08:42 AM, Tim Fora wrote: > > Hi, > > > > Upgraded from 15.08 to 17.02. It took about one hour for slurmdbd t

[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02

2017-06-20 Thread Loris Bennett
g enough to allow me to get slightly uneasy, but not long enough for me to really worry, so I guess it was probably around 10-15 minutes. Our CPUs are around 6 years old, but the DB is on an SSD. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Can't get formatted sinfo to work...

2017-06-18 Thread Loris Bennett
sinfo --version slurm 16.05.10-2 [root@node003 ~]# sinfo -o '%t %E' -hn `hostname` mix none [root@node003 ~]# sinfo -hn `hostname` test up3:00:00 0n/a main* up 14-00:00:0 1mix node003 gpuup 14-00:00:0 0n/a HTH, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm accounting problem with GPFS

2017-06-09 Thread Loris Bennett
> Am 09.06.2017 um 12:02 schrieb Loris Bennett: >> >> Hi Marcel, >> >> Marcel Sommer writes: >> >>> Slurm accounting problem with GPFS >>> >>> Hi, >>> >>> we are running slurm 2.6.5 and we have a master and a bac

[slurm-dev] Re: Slurm accounting problem with GPFS

2017-06-09 Thread Loris Bennett
Due to a security vulnerability (CVE-2016-10030), all versions of Slurm prior to 15.08.13 or 16.05.8 are no longer available. So you need to do an update anyway. And as the intermediate versions are now no longer available, you basically just need to set up Slurm again from scratch. Sor

[slurm-dev] Re: understanding of Purge in Slurmdb.conf

2017-06-08 Thread Loris Bennett
nth. Disclaimer: I haven't used these settings - I am repeating what it says in the documentation. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: understanding of Purge in Slurmdb.conf

2017-06-07 Thread Loris Bennett
o be more specific about what you don't understand. The documentation you refer to seems to me to be fairly clear. As described, the parameters just allow you to set various time periods after which various types of entries in the database will be purged. I'm not sure how a diagram wou

[slurm-dev] Re: srun - replacement for --x11?

2017-06-06 Thread Loris Bennett
Edward Walter writes: > On 06/06/2017 05:29 AM, Loris Bennett wrote: >> >> Hi, >> >> We used to tell users that they could specify the '--x11' option >> to run a graphical application interactively within a Slurm job. >> With version 16.05.10-2

[slurm-dev] srun - replacement for --x11?

2017-06-06 Thread Loris Bennett
m/faq.html#terminal (or one of the various modifications/forks)? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Wrong Python version used in batch MPI job

2017-06-02 Thread Loris Bennett
e had a similar issue and come up with a solution? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Multinode MATLAB jobs

2017-06-01 Thread Loris Bennett
Hi Benjamin, Benjamin Redling writes: > Hi, > > Am 31.05.2017 um 10:39 schrieb Loris Bennett: >> Does any one know whether one can run multinode MATLAB jobs with Slurm >> using only the Distributed Computing Toolbox? Or do I need to be >> running a Distributed Compu

[slurm-dev] Re: Multinode MATLAB jobs

2017-05-31 Thread Loris Bennett
gt; [0] Nodes in our cluster depending on their age have between 12-24 >> processors available. If >> a user wants a parpool of 24, they must request either a constraint or a >> combination of -N 1 >> and --ntasks-per-node=24, for example. >> >> HTH, John

[slurm-dev] Multinode MATLAB jobs

2017-05-31 Thread Loris Bennett
Hi, Does any one know whether one can run multinode MATLAB jobs with Slurm using only the Distributed Computing Toolbox? Or do I need to be running a Distributed Computing Server too? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu

[slurm-dev] RE: Slurm job priorities

2017-04-27 Thread Loris Bennett
't be correct and are probably causing the NaNs as calculating the NormUsage is probably failing because the sum of the RawUsages is not a sensible value. The number 9223372036854775808 equals 2**63, i.e. 1 larger than the largest signed 64-bit integer, which looks like some sort of overflow or typ

[slurm-dev] RE: Slurm job priorities

2017-04-27 Thread Loris Bennett
0.0005946 1.000 > 0.000 > 260160.0026513 nan0.0005946 1.000 > 0.000 > 259880.0717221 nan0.0010979 1.000 > 0.000 What about sshare -la ? That should show you something abou

[slurm-dev] RE: Slurm job priorities

2017-04-27 Thread Loris Bennett
03 63nan 1 > 1000 0 0 > 25999 djb1 -922337203 2nan 1 > 1000 0 0 > 26000 djb1 -922337203 2nan 1 > 10

[slurm-dev] RE: Slurm job priorities

2017-04-26 Thread Loris Bennett
e a bit of a faff setting up and maintaining shares. All our users have equal shares and only belong to one account, so when we add a users, we just automatically increment all the shares up to the top of the hierarchy and decrement correspondingly when the user is deleted. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm job priorities

2017-04-26 Thread Loris Bennett
just that - weights. If you had, say, a partition with a very large priority, then multiplying it by 1000 could push the total priority over the size of a 32-bit integer. What kinds of values does 'sprio -l' show? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Nodes in state 'down*' despite slurmd running

2017-04-05 Thread Loris Bennett
Ole Holm Nielsen writes: > On 04/05/2017 03:59 PM, Loris Bennett wrote: > >> We are running 16.05.10-2 with power-saving. However, we have noticed a >> problem recently when nodes are woken up in order to start a job. The >> node will go from 'idle~' to, sa

[slurm-dev] Nodes in state 'down*' despite slurmd running

2017-04-05 Thread Loris Bennett
slurmctd on the administration node. Does anyone have any idea what might be going on? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: MaxSubmitPU

2017-03-13 Thread Loris Bennett
shortened for 'MaxSubmitPU' is probably just used for display. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] error: chdir(/var/log): Permission denied

2017-03-03 Thread Loris Bennett
n, er, error? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm version 17.02.0 is now available

2017-02-27 Thread Loris Bennett
> RELEASE_NOTES file available in the source. > > Thanks to all involved! > > Slurm downloads are available from https://schedmd.com/downloads.php. This link currently (09:50 CET) just returns the following: [an error occurred while processing this directive] Cheers, Loris --

[slurm-dev] Re: Power outage causes wrong reports

2017-02-22 Thread Loris Bennett
be useful: https://groups.google.com/d/msg/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ The code heuristically decides how to deal with inconsistencies in the database and produces an SQL script to fix them as well as a second script to roll back the changes. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Permissible updates

2017-02-17 Thread Loris Bennett
ess (e.g. 14.11.x or 15.08.x to 16.05.x) without loss of jobs or other state information I have updated Slurm a few times already and I am a native English speaker, but I still stumble over the current wording. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Em

[slurm-dev] Re: Standard suspend/resume scripts?

2017-02-15 Thread Loris Bennett
more like taking nodes offline in times of > low usage? Yes, because that's what I'm interested in ;-) Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Standard suspend/resume scripts?

2017-02-15 Thread Loris Bennett
and 'node_start' more than something like ipmitool -H $host chassis power on ? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: sacctmgr case insensitive

2017-02-09 Thread Loris Bennett
The login name. Only lowercase usernames are supported. If you are importing the usernames from another system, you could filter them in some way. We import from a central university LDAP server to our own LDAP server and can thus tweak the attributes or add attributes, such as 'loginShell'. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Setting a partition QOS, etc

2017-02-01 Thread Loris Bennett
x27;t have any problem with small jobs starving. On the contrary, as we share nodes, small jobs with moderate memory requirements have an advantage, as there are always a few cores available somewhere in the cluster, even when it is quite full. For this reason we favour large jobs slighty. > Your a

[slurm-dev] Re: Daytime Interactive jobs

2017-01-29 Thread Loris Bennett
large MPI jobs. The total number of jobs a user can have in the test/debug QOS is limited. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] RE: A little bit help from my slurm-friends

2017-01-16 Thread Loris Bennett
S: one for normal jobs and one for test jobs. The latter could have a higher priority, but only a short maximum run-time and possibly a low maximum number of jobs per user. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: A little bit help from my slurm-friends

2017-01-16 Thread Loris Bennett
thought that in general you want to use 'fairshare' as well, but that obviously depends on what you are trying to achieve. > In any case thanks for your help > > David Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: where to find completed job execution command

2017-01-06 Thread Loris Bennett
ob=$SLURM_JOB_ID > > $recordsdir/$SLURM_JOBID.record > > The `scontrol show jobid=` record is saved to the file system for future > reference if it is needed. It might be worth using the option '--oneliner' to print out the record in a single line. You could then p

[slurm-dev] Re: Unrestricted use of a node

2016-12-05 Thread Loris Bennett
Loris Bennett writes: > Hi, > > Ulf Markwardt writes: > >> Dear all, >> >> we are using CR_Core_Memory, granularity of our jobs is cores, so: >> shared nodes. And all is well, jobs get killed once they use too much >> memory, cgroups are in place.

[slurm-dev] Re: Unrestricted use of a node

2016-12-05 Thread Loris Bennett
en I do not get the chance to run on 12 > cores/32 GB. > > Is there already a parameter in Slurm to handle this? > > Thanks, > Ulf Wouldn't the sbatch option --exclusive help? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm license management question

2016-12-05 Thread Loris Bennett
in(licence_strings_available) scontrol_string = scontrol + \ ' update reservationname=licenses_' + vendor + \ ' licenses=' + slurm_licence_available_string if args.initialise: print(slurm_licence_total_string) continue if args.dryrun: print(scontrol_string) continue # Actually update the reservation os.system(scontrol_string) # Strings used for testing # #string = 'Users of MATLAB_Distrib_Comp_Engine: (Total of 16 licenses issued; Total of 0 licenses in use)' #string = 'Users of Wavelet_Toolbox: (Error: 2 licenses, unsupported by licensed server)' --- -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Impact to jobs when reconfiguring partitions?

2016-10-27 Thread Loris Bennett
conf. > > So is restarting slurmctld the only way to let it pick up changes in > slurm.conf? No. You can also do scontrol reconfigure This does not restart slurmctld. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Slurm license management question

2016-10-27 Thread Loris Bennett
which the licenses are defined. This is run as a cron job once a minute. It's a bit of a kludge and obviously won't work well if there is a lot of contention for licenses. I can post the code if anyone is interested. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] scontrol: update multiple jobs?

2016-09-27 Thread Loris Bennett
scontrol: error: Invalid job ID 1135541,1135542 Is this a documentation error? Does the syntax work for more recent versions? -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Zero not allowed in reservation for number of licenses

2016-09-15 Thread Loris Bennett
Hi, If I try to set the number of licenses in a reservation to zero thus: /usr/bin/scontrol update reservationname=licenses_matlab licenses=matlab_MATLAB:0 I get the following: Error updating the reservation: Invalid license specification In my case it is not such a problem, as I am gen

[slurm-dev] Re: Jobs which started and completed within an interval

2016-07-25 Thread Loris Bennett
"Loris Bennett" writes: > Hi, > > Is it possible to find jobs which both started and completed in a given > interval? > > I am investigating an incident, during which an abnormally high load > occurred on one of our storage servers. To this end I would like to &

[slurm-dev] Jobs which started and completed within an interval

2016-07-13 Thread Loris Bennett
question. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: number of processes in slurm job

2016-07-12 Thread Loris Bennett
me thing like mpirun -np ${SLURM_NTASKS} ./mm.o 6000 Cheers, Loris > regards, > > Husen > > On Tue, Jul 12, 2016 at 1:21 PM, Loris Bennett > wrote: > > Husen R writes: > > > number of processes in slurm job > > > > >

[slurm-dev] Re: number of processes in slurm job

2016-07-11 Thread Loris Bennett
eed to give more details about what you did. How did you set the number of processes? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Output of 'sinfo -Nel' not aggregated

2016-07-01 Thread Loris Bennett
Hi Chris, Christopher Samuel writes: > On 30/06/16 17:37, Loris Bennett wrote: > >> With version slurm 15.08.8, the node-oriented output of 'sinfo' is not >> longer aggregated. Instead I get a line for each node, even if the data >> for multiple node

[slurm-dev] Output of 'sinfo -Nel' not aggregated

2016-06-30 Thread Loris Bennett
1 main* mixed 122:6:1 180000 1 ram24gb none Is this a bug and, if so, has it already been fixed? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: License manager and interactively used licenses

2016-06-28 Thread Loris Bennett
"Loris Bennett" writes: > "Loris Bennett" > writes: > >> Hi Roshan, >> >> Yes, you're right - this will work for us. So the update tweaks the >> number of licences available and presumably extends the reservation by >> anot

[slurm-dev] Re: License manager and interactively used licenses

2016-06-28 Thread Loris Bennett
"Loris Bennett" writes: > Hi Roshan, > > Yes, you're right - this will work for us. So the update tweaks the > number of licences available and presumably extends the reservation by > another 30 sec, so that you have essentially an infinite reservation >

[slurm-dev] Typo on reservations webpage

2016-06-28 Thread Loris Bennett
e "The system resource may not require an actual license for use, but Slurm licenses can be used to prevent jobs *needing* the resource from being started when that resource is unavailable." Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Em

[slurm-dev] Re: License manager and interactively used licenses

2016-06-17 Thread Loris Bennett
er on the used licenses. > > I believe this should work for you? > > Best regards, > Roshan > > On 17 June 2016 at 11:12, Loris Bennett > wrote: > > Hi Roshan, > > Thanks for the link - I hadn't spotted that. However, using >

[slurm-dev] Re: License manager and interactively used licenses

2016-06-17 Thread Loris Bennett
n on the current availability of licenses > and will only start a job when the requested no. of licenses > available. > > Cheers, > Roshan > > On 17 June 2016 at 09:44, Loris Bennett > wrote: > > Hi, > > I am looking into configuring Slurm to use ou

[slurm-dev] License manager and interactively used licenses

2016-06-17 Thread Loris Bennett
. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Timeout before resource becomes available

2016-06-12 Thread Loris Bennett
larger value, but wouldn't it make more sense if the run-time for the job only started to accumulate, once the slurmd on the node became available? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Incorrect handling of non-ASCII characters

2016-06-06 Thread Loris Bennett
//unicode.org/faq/normalization.html Thanks for carrying this forward. Regarding the normalisation, in out case, this is not an issue. We get our account information from the university's central identity management system, in which account names can only contain alphanumeric ASCII character

[slurm-dev] Re: How to get rid of "zombie" jobs?

2016-06-06 Thread Loris Bennett
with status "RUNNING" which no longer exists), and not for deleting jobs completely, but it might help: https://groups.google.com/forum/#!msg/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: BadConstraints - node list not recalculated

2016-06-06 Thread Loris Bennett
tically. Cheers, Loris > Cheers, > Miguel >> On 24 May 2016, at 08:46, Loris Bennett wrote: >> >> >> Hi, >> >> The 'Reason' field for a pending job has changed from 'Priority' to >> 'BadConstraints'. This seems to be becaus

[slurm-dev] Re: Incorrect handling of non-ASCII characters

2016-06-02 Thread Loris Bennett
able. Just image Italian had become the dominant language in the USA instead of English - Slurm might think your name is "Gari Brovvn". In fact, I would prefer incorrect justification with umlauts to correct justification without umlauts. [snip (57 lines)] Cheers, Loris -- Dr

[slurm-dev] Re: Incorrect handling of non-ASCII characters

2016-06-02 Thread Loris Bennett
Janne Blomqvist writes: > On 2016-06-01 15:25, Loris Bennett wrote: >> >> Hi, >> >> With Slurm 15.08.8, sreport does not handle non-ASCII characters in the >> 'Proper Name' column properly: >> >> Top 3 Users 2016-05-31T00:00:00 - 2

[slurm-dev] Re: Understanding --exclusive

2016-06-02 Thread Loris Bennett
ed. Duplicate node names in the list will be ignored. The order of the node names in the list is not important; the node names will be sorted by Slurm. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Incorrect handling of non-ASCII characters

2016-06-01 Thread Loris Bennett
to the final line of data have an additional space at the end of the line. The terminal space is not much of a problem, but it would be nice if justification problem could be fixed. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: How to setup slurm database accounting feature

2016-05-24 Thread Loris Bennett
in this state before this period. Cheers, Loris > Thank you in advance. > > Regards, > Husen [snip (33 lines)] -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: More tasks than allocated CPUs

2016-05-24 Thread Loris Bennett
eivable that you are being bitten by a probably long-fixed bug. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] BadConstraints - node list not recalculated

2016-05-23 Thread Loris Bennett
ecifies the number of tasks required, not specific nodes. Shouldn't the scheduler just be able to replace the draining node with another node in the projected node list? This is happening with version 15.08.8. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

[slurm-dev] Re: How to get command of a running/pending job

2016-05-17 Thread Loris Bennett
Benjamin Redling writes: > On 05/17/2016 10:02, Loris Bennett wrote: >> >> Benjamin Redling >> writes: >> >>> On 2016-05-13 05:58, Husen R wrote: >>>> Does slurm provide feature to get command that being executed/will be >>>> execu

  1   2   3   >