[slurm-dev] Re: Accounting and limits

2017-02-13 Thread Paddy Doyle
to minutes */ > for (i=0; i diff -Naru slurm-16.05.9/src/sshare/sshare.h > slurm-16.05.9.change/src/sshare/sshare.h > --- slurm-16.05.9/src/sshare/sshare.h 2017-01-31 11:55:41.0 -0800 > +++ slurm-16.05.9.change/src/sshare/sshare.h2017-02-08 15:

[slurm-dev] Re: Node switching to DRAIN for unknown reason, trouble shooting ideas?

2017-02-01 Thread Paddy Doyle
[2017-01-31T09:45:22.329] debug2: node_did_resp r1-02 > > > [2017-01-31T09:45:22.329] debug2: node_did_resp r1-04 > > > [2017-01-31T09:45:22.329] debug2: node_did_resp r1-01 > > > [2017-01-31T09:45:22.341] debug2: Processing RPC: > > > MESSAGE_NODE_REGISTRATI

[slurm-dev] Re: Backup controller not responding to requests

2017-01-30 Thread Paddy Doyle
running 'sinfo' for instance, it merely hangs. > The interfaces for both slurmctld controllers are in the 'trusted' firewall > group and there is no filtering between them. > Is there something I am missing to make the backup controller 'kick in' and > star

[slurm-dev] Re: Node switching to DRAIN for unknown reason, trouble shooting ideas?

2017-01-30 Thread Paddy Doyle
This is on our testing/development grid systems so we can easily make > changes to debug/fix the problem. > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: Power outage causes wrong reports

2017-01-23 Thread Paddy Doyle
n the DB. Is it safe to run "arbitrary" commands in the > DB, bypassing slurmdbd? > > Thanks in advance. > > > -- lv. > > > -- lv. > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: strange srun problem

2017-01-23 Thread Paddy Doyle
about mpi: > >srun -n 2 echo "Hello" > Hello > Hello > > How can I resolve the problem of srun, and let it behaves like sbatch or > salloc, where the program executed only one time? > > The version of slurm is 16.05.3, and Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: Prolog behavior with and without srun

2017-01-06 Thread Paddy Doyle
tput > is too verbose and gets intermingled with the actual job output (the head > node seams to execute before the user???s job runs but prolog seems to > execute in parallel with the user???s job on the other nodes). The bigger > problem is all the nodes ssh-ing to all the other node

[slurm-dev] Re: Fwd: how to perform a DB upgrade?

2017-01-05 Thread Paddy Doyle
On Thu, Jan 05, 2017 at 02:06:58AM -0800, Riccardo Murri wrote: > > (Paddy Doyle, Thu, Jan 05, 2017 at 01:19:57AM -0800:) > > > > On Wed, Jan 04, 2017 at 10:26:06PM -0800, Riccardo Murri wrote: > > > > > > Thanks for all the suggestions. Everything worked

[slurm-dev] Re: Fwd: how to perform a DB upgrade?

2017-01-05 Thread Paddy Doyle
ve already gone into production with the new setup, why bother? (Ok the users might notice and get worried; and yes if they are sorting slurm-nnn.out files based on name and not date, then it's a pain) Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College

[slurm-dev] Re: Fwd: how to perform a DB upgrade?

2017-01-04 Thread Paddy Doyle
r Science IT > University of Zurich > Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) > Tel: +41 44 635 4208 > Fax: +41 44 635 6888 -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: Invalid account or account/partition combination specified

2017-01-03 Thread Paddy Doyle
terra 122858328+1 >yangliu hprc Operator terra hprc > > > 1 > > > ]$ sacctmgr list account >AccountDescr Org > -- ---- > > 1228

[slurm-dev] Re: job array with partition shared=exclusive

2016-12-12 Thread Paddy Doyle
hieve that? > > If yes, is there an other way to ensure that for each normal job a full > node is allocated? > > Thanks -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: slurmdbd start error

2016-12-02 Thread Paddy Doyle
error: slurmdbd: DBD_SEND_MULT_JOB_START failure: > Connection refused > > This was a running system and we just pushed out an update from 15.8.10 to > 15.8.12 As a wild guess, was the old daemon still running (and still listening on port 6819)? Paddy -- Paddy Doyle Trinity Centre for High P

[slurm-dev] Re: Install Slurm in a mic

2016-12-02 Thread Paddy Doyle
m is to run native applications, as we are not > interested in the offload mode. This old thread might be of help: https://groups.google.com/forum/#!searchin/slurm-devel/xeon$20phi|sort:relevance/slurm-devel/0bnMLfV1qA8/de1a89yEl4sJ Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Ll

[slurm-dev] Re: squeue returns "invalid user" for a user that has jobs running

2016-11-28 Thread Paddy Doyle
ute-7-1 > > > Clearly user clwalton is a valid user and has jobs running, but if I try to > specify him, squeue isn't happy. It is fine with other users... > What would cause this? > > Brian Andrus > ITACS/Research Computing > Naval Postgraduate School > Monterey,

[slurm-dev] Re: max submit tasks

2016-11-22 Thread Paddy Doyle
(# jobs x # tasks) into account; perhaps a job submit plugin? Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: external slurmdbd for multiple clusters

2016-09-30 Thread Paddy Doyle
. PurgeStepAfter=1month PurgeJobAfter=12month Do you need to run detailed historical job reports, or push the data to another system (you mentioned XDMoD in a different thread)? Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: Send mail from SLURM throught a mail server on a local network

2016-09-16 Thread Paddy Doyle
nts it can perform relaying for). For postfix, that would be something like 'relay_domains'. Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: Missing user in sreport

2016-09-13 Thread Paddy Doyle
www.bcamath.org/eanasagasti <http://www.bcamath.org/eanasagasti> > */ > /* > */(/*///matematika mugaz bestalde *) > > */ > On 09/09/16 10:17, Paddy Doyle wrote: > >Hi Eneko, > > > >On Fri, Sep 09, 2016 at 12:12:32AM -0700, Eneko Anasagasti wrote: >

[slurm-dev] Re: single node workstation

2016-09-09 Thread Paddy Doyle
> >>>>> >> >> >> and > >>>>> >> >> >> script parameters to run a job array on a single node/server > >>>>> >> >> >> workstation, with more than one concurrent task of the job > >>>

[slurm-dev] Re: Missing user in sreport

2016-09-09 Thread Paddy Doyle
t know what 'slurmreport' is, but that looks like a Perl include path error. Maybe you need to install the slurm-perlapi package? Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: external slurmdbd for multiple clusters

2016-09-02 Thread Paddy Doyle
> > `' > > > > > ____ > *** The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be legally > privileged. If the reader of this message is not the intended recipient, you > are hereby notified that any dissemination, distribution, or copying of this > communication, or any of its contents, is strictly prohibited. If you have > received this communication in error, please return it to the sender > immediately and delete the original message and any copies of it. If you have > any questions concerning this message, please contact the sender. *** -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: defining jobs slots

2016-08-17 Thread Paddy Doyle
On Wed, Aug 17, 2016 at 03:20:20AM -0700, Adrian Sevcenco wrote: > > On 08/17/2016 08:41 AM, Paddy Doyle wrote: > > > > Hi Adrian, > Hi! > > yeah, i thought that if i set CPUs=8 this will set that that machine > have 8 job slots > > > You should define

[slurm-dev] Re: defining jobs slots

2016-08-17 Thread Paddy Doyle
Hi Adrian, On Tue, Aug 16, 2016 at 08:11:41AM -0700, Adrian Sevcenco wrote: > > Hi! I have trouble understanding the definition of job slots for each node ... > and i am running slurm only on my desktop to get used to it until i move it > to the clusters .. it is not clear to me how one can def

[slurm-dev] Re: Not recognizing other cluster nodes

2016-07-22 Thread Paddy Doyle
On Fri, Jul 22, 2016 at 09:42:38AM -0700, P. Larry Nelson wrote: > > Paddy, that was the problem! Many thanks! Good stuff. :) > However, slurmd -C still reports (null) for ClusterName. > > Not a big issue, but I'm curious why, since slurm.conf has > ClusterName=dorfman > > Something else I'

[slurm-dev] Re: Not recognizing other cluster nodes

2016-07-22 Thread Paddy Doyle
7-244-9855) | IT Administrator > 457 Loomis Lab | High Energy Physics Group > 1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill. > MailTo: lnel...@illinois.edu | http://hep.physics.illinois.edu/home/lnelson/ > -- > "Information without accountability is just noise." - P.L. Nelson > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: slurmdbd association lifetime/expiry

2016-07-07 Thread Paddy Doyle
negative > after they reach zero but the assocation usage decays and the actual usage as > seen by sreport is > allowed to continue to increase? > > Best regards > > Stuart > > On 23/04/15 10:24, Paddy Doyle wrote: > > This works reasonably well for us, but w

[slurm-dev] Re: Writing to job output files from prolog and epilog scripts

2016-07-07 Thread Paddy Doyle
e the job executed on the cluster (on which compute nodes) and > how many (much) resources the job used. > > I???ve set up some prototype slurm prolog and epilog scripts and included > some write (echo) statements, however I don???t see any of the information in > job output fi

[slurm-dev] Re: SchedulerParameters and topology problem

2016-05-20 Thread Paddy Doyle
Doh! Sorry, please ignore: we have a reservation in place for a user starting today, and so obviously those nodes are left idle as backfill can't start the longer jobs. Paddy On Fri, May 20, 2016 at 09:12:29AM +0100, Paddy Doyle wrote: > Forgot to mention, it's slurm version

[slurm-dev] Re: SchedulerParameters and topology problem

2016-05-20 Thread Paddy Doyle
Forgot to mention, it's slurm version slurm-15.08.7 On Fri, May 20, 2016 at 09:03:06AM +0100, Paddy Doyle wrote: > Hi all, > > We're seeing a really strange scheduling issue on one of our clusters, whereby > jobs are not being scheduled, even though there are many id

[slurm-dev] SchedulerParameters and topology problem

2016-05-20 Thread Paddy Doyle
0:00 6 kelvin-n[014-015,025-026,028-029] 86677 compute GdDC_25_ lucida RUNNING 13:16:34 2-00:00:00 6 kelvin-n[007-012] -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://

[slurm-dev] Re: TRES grp_used_tres_run_secs underflow 15.08.1

2016-01-12 Thread Paddy Doyle
hought I???d mention it anyway as no one > likes errors in their logs! :) > > We were using GrpCPURunMins to limit resource use to accounts, that seems to > be handled by GrpTRESRunMin now. > > Let me know if you need more info! > > Chris > > ??? > Christophe

[slurm-dev] Re: Job priority oddity: --nodes and --ntasks in priority/multifactor

2015-12-07 Thread Paddy Doyle
uld make a difference with shared nodes. I think for now we'll just have to watch the users and ask them to play nicely if it comes up again. :) Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Job priority oddity: --nodes and --ntasks in priority/multifactor

2015-11-30 Thread Paddy Doyle
me group in this instance, so we can ask them to play nicely with each other). But it would be good if there was a way to harden the priority system against it. I've looked in slurm.conf and can't see any parameter which might be relevant. Or is the current behaviour a desired feature f

[slurm-dev] Re: slurmdbd association lifetime/expiry

2015-04-23 Thread Paddy Doyle
-- > >>Maciej Olchowik > >>HPC Systems Administrator > >>KAUST Supercomputing Laboratory (KSL) > >>Al Khawarizmi Bldg. (1) Room 0134 > >>Thuwal, Kingdom of Saudi Arabia > >>tel +966 2 808 0684 > >> > >>____

[slurm-dev] Re: slurm on NFS for a cluster?

2015-03-26 Thread Paddy Doyle
rnoon, > >> > >>I apologies for the newb question but I'm setting up slurm > >>for the first time in a very long time. I've got a small cluster > >>of a master node and 4 compute nodes. I'd like to install > >>slurm on an NFS file system th

[slurm-dev] Re: Start script for GAMESS?

2014-09-26 Thread Paddy Doyle
launcher, which takes care of those details I think. mpiexec.hydra -bootstrap slurm ... Hopefully it will be of use to you. Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://

[slurm-dev] Re: Cluster(s) seem OK, but: Zero Bytes were transmitted or received (14.03.6)

2014-08-18 Thread Paddy Doyle
Hi Gerben, On Sun, Aug 17, 2014 at 01:26:12PM -0700, Gerben Roest wrote: > > I run a slurmctld and slurmdbd on a Scientific Linux (SL) 5 server and > have three SL6 nodes, all running Slurm 14.03.6, with one node behind > another slurmctld on another cluster. The whole slurm setup seems to run

[slurm-dev] Re: make check fails with 14.03.6

2014-08-15 Thread Paddy Doyle
On Fri, Aug 15, 2014 at 01:49:58AM -0700, Bj?rn-Helge Mevik wrote: > > We're testing slurm 14.03.6 on a Rocks 6.1/CentOS 6.3 system. configure > and make succeeds, but make check fails: > > /bin/sh: ../../../auxdir/test-driver: No such file or directory > make[6]: *** [api-test.log] Error 127

[slurm-dev] Re: Slurm daemon on Xeon Phi cards?

2014-06-27 Thread Paddy Doyle
t; Would you mind posting here the steps to slurm cross-compile to xeon phi or > any URL? In the linkedin discussion isn't describe the procedure > > > Thanks in advance > > > El 05/03/2014, a las 11:57, Paddy Doyle escribió: > > > > > Hi all, > > >

[slurm-dev] Re: Nodes in a perpetual "drain" state.

2014-06-27 Thread Paddy Doyle
obvious question, but have you set the nodes to be 'resume' or 'idle' using scontrol since then? In our setup at least, once a node is marked 'down', we have to manually clear it to either 'resume' or 'idle'. Paddy -- Paddy Doyle Trinity Centre

[slurm-dev] Re: accounting error with mysql not slurmdbd : Munge encode failed: Failed to access "PASSWORD"

2014-04-21 Thread Paddy Doyle
ORD": No such > file or directory (retrying ...) > sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such > file or directory (retrying ...) > sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such > file or directory >

[slurm-dev] Re: Slurm daemon on Xeon Phi cards?

2014-03-05 Thread Paddy Doyle
or > >> VLSCI - Victorian Life Sciences Computation Initiative > >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > >> http://www.vlsci.org.au/ http://twitter.com/vlsci > >> > >> -BEGIN PGP SIGNATURE- > >> Version: GnuPG v1.4.14 (GNU/Linux) > >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > >> > >> iEYEARECAAYFAlMCuvIACgkQO2KABBYQAh+pwgCcCLPvoUJamArfmpxY5igcJm3I > >> 0p0AnjF51qUgZfoZtIsKTDLCK+pJe+bf > >> =7HO3 > >> -END PGP SIGNATURE- > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: accounting_storage/mysql

2014-01-11 Thread Paddy Doyle
-- -- -- > > 57 hostname production root 1 COMPLETED > 0:0 > 58 hostname production root 1 COMPLETED > 0:0 > 59 hostname production 1 COMPLETED > 0:0 >

[slurm-dev] Re: accounting_storage/mysql

2014-01-07 Thread Paddy Doyle
Comment=YES > > ClusterName=cluster > > JobCompType=jobcomp/none > > JobAcctGatherFrequency=30 > > JobAcctGatherType=jobacct_gather/none > > SlurmctldDebug=3 > > SlurmdDebug=3 > > NodeName=ssfslurmc0[1] Procs=2 RealMemory=2006 State=UNKNOWN > > PartitionName=debug Nodes=ssfslurmc0[1] Default=NO MaxTime=INFINITE State=UP > > PartitionName=production Nodes=ssfslurmc0[1] Default=YES MaxTime=INFINITE > State=UP > > Thanks for your assistance with this. > > Regards, > -J -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Re: full username info in squeue output

2013-11-21 Thread Paddy Doyle
and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > ww

[slurm-dev] Re: GrpCPUMins and GrpWall causing running jobs to be killed

2012-11-07 Thread Paddy Doyle
the same issue. The patch should work with 2.4 if you didn't want to > wait for 2.5. Thanks Danny, that looks just like what I was shooting for. Will try it out. Thanks! Paddy > > Danny > > On 11/06/2012 10:35 AM, Paddy Doyle wrote: > > Hi again, > > > >

[slurm-dev] GrpCPUMins and GrpWall causing running jobs to be killed

2012-11-06 Thread Paddy Doyle
ge checks, similar to my previously proposed patch. Any thoughts / comments? Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/

[slurm-dev] Patch to check if AccountingStorageEnforce is set in job_time_limit()

2012-11-06 Thread Paddy Doyle
't actually checking to see if limits were being enforced before killing the job. See attached a patch which checks to see if limits or qos are enforced before killing the job. I've tested it with 2.4.3 and it does what I expect - haven't tried 2.4.4, but the job_time_limit() logic

[slurm-dev] Re: Possible Age Priority bug in SLURM version 2.4.1

2012-09-05 Thread Paddy Doyle
ture=thin,mem24GB,ibsw8 > Weight=1 > 89 NodeName=q[253-284] RealMemory=24000 Feature=thin,mem24GB,ibsw9 > Weight=1 > 90 NodeName=q[285-316] RealMemory=24000 Feature=thin,mem24GB,ibsw10 > Weight=1 > 91 NodeName=q[317-348] RealMemory=24000 Feature=thin,mem24GB,ibsw11 > Weight=1 > 92 > 93 PartitionName=all Nodes=q[1-348] Shared=EXCLUSIVE > DefaultTime=00:00:01 MaxTime=14400 State=DOWN > 94 PartitionName=core Nodes=q[45-348] Default=YES Shared=NO > MaxTime=14400 MaxNodes=1 State=UP > 95 PartitionName=node Nodes=q[1-32,45-348] Shared=EXCLUSIVE > DefaultTime=00:00:01 MaxTime=14400 State=UP > 96 PartitionName=devel Nodes=q[33-44] Shared=EXCLUSIVE > DefaultTime=00:00:01 MaxTime=60 MaxNodes=4 State=UP > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/