Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
$ srun --version slurm 18.08.4 I have noticed that after 60 seconds, the job is aborted according to the output log file. srun: First task exited 60s ago srun: step:759.0 pack_group:0 tasks 0-1: exited srun: step:760.0 pack_group:1 tasks 0-1: running srun: step:760.0 pack_group:1 tasks 2-3: exite

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Chris Samuel
On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote: > Still only one node is running the processes What does "srun --version" say? Do you get any errors in your output file from the second pack job? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley,

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in Still only one node is running the processes $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 755+1QUARTZ myQE ghatee R 0:47 1 rocks7

Re: [slurm-users] SLURM User Group Meetings: "Back Issues"

2019-03-27 Thread Chris Samuel
On 27/3/19 7:56 pm, Kevin Buckley wrote: Does the SchedMD website contain "back issues" of SLURM User Group Meeting info Yup, somewhat non-intuitively as publications: https://slurm.schedmd.com/publications.html Goes all the way back to something at SC08! -- Chris Samuel : http://www.csa

[slurm-users] SLURM User Group Meetings: "Back Issues"

2019-03-27 Thread Kevin Buckley
I happened to be reading the NERSC website's news article https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2017/nersc-co-hosts-2017-slurm-user-group-meeting/ which searching for a particular talk. The NERSC news article contains a link to the SchedMD website behind the "xl

Re: [slurm-users] number of nodes varies for no reason?

2019-03-27 Thread Chris Samuel
On 27/3/19 2:43 pm, Noam Bernstein wrote: Hi fellow slurm users - I’ve been using slurm happily for a few months, but now I feel like it’s gone crazy, and I’m wondering if anyone can explain what’s going on.  I have a trivial batch script which I submit multiple times, and ends up with differe

Re: [slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-27 Thread Chris Samuel
On 27/3/19 1:00 pm, Anne M. Hammond wrote: NodeName=fl[01-04] CPUs=24 RealMemory=4 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN This will give you 12 tasks per node, each task with 2 thread units. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-27 Thread Anne M. Hammond
Thanks. A second user cannot allocate any tasks on the node which is running 12 processes. So it does look like it slurm is tieing processes to physical cores. Further interesting is that top shows all 24 Cpus at ~95%: %Cpu0 : 93.4 us, 0.7 sy, 0.0 ni, 6.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0

[slurm-users] number of nodes varies for no reason?

2019-03-27 Thread Noam Bernstein
Hi fellow slurm users - I’ve been using slurm happily for a few months, but now I feel like it’s gone crazy, and I’m wondering if anyone can explain what’s going on. I have a trivial batch script which I submit multiple times, and ends up with different numbers of nodes allocated. Does anyone h

Re: [slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-27 Thread Renfro, Michael
Can a second user allocate anything on node fl01 after the first user requests their 12 tasks per node? If not, then it looks like tasks are being tied to physical cores, and not a hyperthreaded version of a core. -- Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services

[slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct

2019-03-27 Thread Anne M. Hammond
We are just getting started with slurm here. We have slurm 18.08.6-2 /etc/slurm/slurm.conf: NodeName=fl[01-04] CPUs=24 RealMemory=4 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Cannot allocate ntasks-per-node: [hammond@hydrogen VSim-9.0]$ srun -N 1 --ntasks-per-node=24 --p

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Frava
Hi, if you try this SBATCH script, does it work ? #!/bin/bash #SBATCH --job-name=myQE #SBATCH --output=big-mem # #SBATCH --mem-per-cpu=16g --ntasks=2 #SBATCH -N 1 #SBATCH --partition=QUARTZ #SBATCH --account=z5 # #SBATCH packjob # #SBATCH --mem-per-cpu=10g --ntasks=4 #SBATCH -N 1 #SBATCH --partiti

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
OK. The two different partitions I saw was due to not specifying partition name for the first set (before packjob). Here is a better script #!/bin/bash #SBATCH --job-name=myQE #SBATCH --output=big-mem #SBATCH --mem-per-cpu=16g --ntasks=2 #SBATCH -N 1 #SBATCH --partition=QUARTZ #SBATCH --account=z5

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 11:29 AM, Mahmood Naderan wrote: Thank you very much. you are right. I got it. Cool, good to hear. I'd love to hear whether you get heterogenous MPI jobs working too! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
Thank you very much. you are right. I got it. Regards, Mahmood On Wed, Mar 27, 2019 at 10:33 PM Thomas M. Payerle wrote: > As partition CLUSTER is not in your /etc/slurm/parts file, it likely was > added via scontrol command. > Presumably you or a colleague created a CLUSTER partition, wheth

Re: [slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan
Thank you very much. Regards, Mahmood On Wed, Mar 27, 2019 at 10:27 PM Thomas M. Payerle wrote: > From sacctmgr man page: > "To clear a previously set value use the modify command with a new value > of -1 for each TRES id." > > So something like > # sacctmgr modify user ghatee set GrpTRES=me

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Thomas M. Payerle
As partition CLUSTER is not in your /etc/slurm/parts file, it likely was added via scontrol command. Presumably you or a colleague created a CLUSTER partition, whether intentionally or not. Use scontrol show partition CLUSTER to view it. On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan wrote: >

Re: [slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Thomas M. Payerle
>From sacctmgr man page: "To clear a previously set value use the modify command with a new value of -1 for each TRES id." So something like # sacctmgr modify user ghatee set GrpTRES=mem=-1 Similar for other TRES settings On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan wrote: > Hi, > I want to

[slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan
Hi, I want to remove a user's memory limit. Currently, I see # sacctmgr list association format=account,user,partition,grptres,maxwall | grep ghatee local ghatee cpu=16 z5 ghatee quartz cpu=16,mem=1+ 30-00:00:00 I have modified with different number of

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
So, it seems that it is not an easy thing at the moment! >Partitions are defined by the systems administrators, you'd need to >speak with them about their reasoning for those. Its me :) I haven't defined a partition named CLUSTER Regards, Mahmood On Wed, Mar 27, 2019 at 8:42 PM Christopher S

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 8:39 AM, Mahmood Naderan wrote: mpirun pw.x -imos2.rlx.in You will need to read the documentation for this: https://slurm.schedmd.com/heterogeneous_jobs.html Especially note both of these: IMPORTANT: The ability to execute a single application across more th

Re: [slurm-users] spart: A user-oriented partition info command for slurm

2019-03-27 Thread Ole Holm Nielsen
Hi Ahmet, On 3/27/19 10:51 AM, mercan wrote: Except sjstat script, Slurm does not contains a command to show user-oriented partition info. I wrote a command. I hope you will find it useful. https://github.com/mercanca/spart Thanks for a very useful new Slurm command! /Ole

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Prentice Bisbal
On 3/27/19 11:25 AM, Christopher Samuel wrote: On 3/27/19 8:07 AM, Prentice Bisbal wrote: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node1 and Node2. You did not mention any memory requirements of your job, so I assumed memory is

Re: [slurm-users] strange resource allocation issue - thoughts?

2019-03-27 Thread Sharma, M D
Hi Prentice, Thanks, that was one of the many different ways we had tested but still had the same issue. Best Regards, MD Sharma(sent via a mobile device.. please blame the AI driven autocorrect algorithm for any errors) From: slurm-users on behalf of Prentice

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>If your SLURM version is at least 18.08 then you should be able to do it with an heterogeneous job. See https://slurm.schedmd.com/>heterogeneous_jobs.html >From the example in that page, I have written this #!/bin/bash #SBATCH --job-name=myQE

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 8:07 AM, Prentice Bisbal wrote: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node1 and Node2. You did not mention any memory requirements of your job, so I assumed memory is not an issue and didn't specify any in my comman

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Prentice Bisbal
On 3/25/19 8:09 AM, Mahmood Naderan wrote: Hi Is it possible to submit a multinode mpi job with the following config: Node1: 16 cpu, 90GB Node2: 8 cpu, 20GB ? Regards, Mahmood Yes: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node

Re: [slurm-users] strange resource allocation issue - thoughts?

2019-03-27 Thread Prentice Bisbal
On 3/23/19 2:16 PM, Sharma, M D wrote: Hi folks, By default slurm allocates the whole node for a job (even if it specifically requested a single core). This is usually taken care of by adding SelectType=select/cons_res along with an appropriate parameter such as SelectTypeParameters=CR_Core_

Re: [slurm-users] Slurm users meeting 2019?

2019-03-27 Thread David Baker
Thank you for the date and location of the this year's Slurm User Group Meeting. Best regards, David From: slurm-users on behalf of Jacob Jenson Sent: 25 March 2019 21:26:45 To: Slurm User Community List Subject: Re: [slurm-users] Slurm users meeting 2019? T

[slurm-users] spart: A user-oriented partition info command for slurm

2019-03-27 Thread mercan
Hi; Except sjstat script, Slurm does not contains a command to show user-oriented partition info. I wrote a command. I hope you will find it useful. https://github.com/mercanca/spart Regards, Ahmet M.

[slurm-users] Is there an equivalent of OpenMPI --timestamp-output with srun?

2019-03-27 Thread Eric Chamberland
Hi, While using "srun", I want to use OpenMPI --timestamp-output option but I can't find any "srun" equivalent option. I tried to use "OMPI_MCA_orte_timestamp_output=1" environment variable to pass the option to the underlying OpenMPI of srun, but it does not activate the timestamp. Is the