[slurm-dev] Re: MaxRSS vs MaxVMSize

2017-10-11 Thread E V
MaxRSS. On Wed, Oct 11, 2017 at 1:26 PM, Vaidhyanathan Mahaganapathy wrote: > > Hi there, > > I am trying to estimate the amount of memory to request for my job > using saact on a test job. Should I use MaxRSS or MaxVMSize as a > guide? > > Thank you, > Vaidhy

[slurm-dev] MaxRSS vs MaxVMSize

2017-10-11 Thread Vaidhyanathan Mahaganapathy
Hi there, I am trying to estimate the amount of memory to request for my job using saact on a test job. Should I use MaxRSS or MaxVMSize as a guide? Thank you, Vaidhy

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-11 Thread VĂ©ronique LEGRAND
Hello Pierre-Marie, I stopped the slurmd daemon on tars-XXX then restarted it in the foreground with: sudo /my/path/to /slurmd - -D -d /opt/slurm/sbin/slurmstepd and got: slurmd: Gres Name=disk Type=(null) Count=204000 and also: slurmd: debug3: CPUs=12 Boards=1 Sockets=2 Cores=6

[slurm-dev] Infrequent output flush of STDOUT and STDERR in slurm jobs

2017-10-11 Thread Felix Willenborg
Dear everyone, I experience a very infrequent (like every 30 minutes or so) output flush of STDOUT and STDERR in my Slurm system (slurm 14.11.8 on RHEL7, all nodes share a network share on which data is written). I'd like to monitor the output 'live' via tailf or something similiar to have an

[slurm-dev] Questions about resource requests

2017-10-11 Thread zhangtao102019
Hello, I am installing the slurm-17.02.6 on my testcluster, and i will run my software on this cluster. On my cluster, each computing node have been configured to have 4 GPUs. Now i meet three problem as below: 1. How can i specify the different GPUs request on different nodes in slurm

[slurm-dev] Re: file and directory permissions

2017-10-11 Thread Marcus Wagner
How about the JobCheckpointDir? Shouldn't this be also a shared dir? Anyone using checkpointing with slurm? Best Marcus On 10/11/2017 08:17 AM, Loris Bennett wrote: Hi Marcus, Marcus Wagner writes: Hello, everyone. I'm also fairly new to slurm, still in a

[slurm-dev] Re: job allocation lag

2017-10-11 Thread John Hearns
Vladimir, in cases where you have a 'hairs on the back of your neck' feeling it is often the case that these indicate something real. However, you do have to be scientific about this. If you think that uptime is an influence, you have to record job startup times each hour, and plot these. Be

[slurm-dev] job allocation lag

2017-10-11 Thread Vladimir Daric
Hello, We are running a 10 node cluster in our lab and we are experiencing a job allocation lag. srun commands wait for resource allocation up to 1 minute even if there are several idle nodes. It's the same with sbatch scripts. Even if there are idle nodes, jobs are waiting for about one

[slurm-dev] Preemtion and signals, v2

2017-10-11 Thread tegner
New thread since I have narrowed down the problem. Consider the script: ** #!/bin/bash #SBATCH -p cheap #SBATCH -n 32 #SBATCH -t 12:00:00 sig_term() { echo "function sig_term called. Exiting" echo 'sig_term' > slask_term echo $(date) >> slask_term } # associate the

[slurm-dev] Re: file and directory permissions

2017-10-11 Thread Marcus Wagner
Thx Loris! On 10/11/2017 08:17 AM, Loris Bennett wrote: Hi Marcus, Marcus Wagner writes: Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which

[slurm-dev] Re: file and directory permissions

2017-10-11 Thread Loris Bennett
Hi Marcus, Marcus Wagner writes: > Hello, everyone. > > I'm also fairly new to slurm, still in a conceptual rather than a test or > productive phase. Currently I am still trying to find out where to create > which > files and directories, on the host or in a network