Re: [slurm-users] Getting current memory size of a job

2019-04-01 Thread Jeffrey Frey
If you're on Linux and using Slurm cgroups, your job processes should be contained in a memory cgroup. The /proc//cgroup file indicates to which cgroups a process is assigned, so: $ srun [...] /bin/bash -c "grep memory: /proc/\$\$/cgroup | sed 's%^[0-9]*:memory:%/sys/fs/cgroup/memory%'" /sys/

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Lech Nieroda
We’ve run into exactly the same problem, i.e. an extremely long upgrade process to the 17.11.x major release. Luckily, we’ve found a solution. The first approach was to tune various innodb options, like increasing the buffer pool size (8G), the log file size (64M) or the lock wait timeout (900)

[slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

2019-04-01 Thread Randall Radmer
I can’t get backfill to work for a machine with two GPUs (one is a P4 and the other a T4). Submitting jobs works as expected: if the GPU I request is free, then my job runs, otherwise it goes into a pending state. But if I have pending jobs for one GPU ahead of pending jobs for the other GPU, I s

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-04-01 Thread Prentice Bisbal
On 3/28/19 1:25 PM, Reuti wrote: Hi, Am 22.03.2019 um 16:20 schrieb Prentice Bisbal : On 3/21/19 6:56 PM, Reuti wrote: Am 21.03.2019 um 23:43 schrieb Prentice Bisbal: Slurm-users, My users here have developed a GUI application which serves as a GUI interface to various physics codes they

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Chris Samuel
On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote: > Further analysis of the query has shown that the mysql optimizer has choosen > the wrong execution plan. This may depend on the mysql version, ours was > 5.1.69. I suspect this is the issue documented in the release notes for 17.11: ht

Re: [slurm-users] Getting current memory size of a job

2019-04-01 Thread Chris Samuel
On Friday, 29 March 2019 8:01:53 AM PDT Mahmood Naderan wrote: > Is there any way to view current memory allocation of a running job? With > 'sstat' I can get only MAX values, including MaxVMSize, MaxRSS. When I was at Swinburne we asked for this as an enhancement here: https://bugs.schedmd.com/

Re: [slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

2019-04-01 Thread Marcus Wagner
Dear Randall, could you please also provide scontrol -d show node computelab-134 scontrol -d show job 100091 scontrol -d show job 100094 Best Marcus On 4/1/19 4:31 PM, Randall Radmer wrote: I can’t get backfill to work for a machine with two GPUs (one is a P4 and the other a T4). Submit