Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-22 Thread Goetz, Patrick G
Can't you just set the usage priority to be higher for the 2GB machines? This way, if the requested memory is less than 2GB those machines will be used first, and larger jobs skip to the higher memory machines. On 11/21/19 9:44 AM, Jim Prewett wrote: > > Hi Sistemas, > > I could be mista

Re: [slurm-users] Execute scripts on suspend and cancel

2019-10-17 Thread Goetz, Patrick G
Are applications even aware when they've been hit by a SIGSTP? This idea of a license being released under these circumstances just seems very unlikely. On 10/15/19 1:57 PM, Brian Andrus wrote: > It seems that there are some details that would need addressed. > > A suspend signal is nothing mo

Re: [slurm-users] How to share GPU resources? (MPS or another way?)

2019-10-08 Thread Goetz, Patrick G
On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote: > GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks to > describe what I hit > (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like > the mps-server will be created to each user and the > server will be running

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Goetz, Patrick G
On 9/19/19 8:22 AM, Thomas M. Payerle wrote: > one of our clusters > is still running RHEL6, and while containers based on Ubuntu 16, > Debian 8, or RHEL7 all appear to work properly, > containers based on Ubuntu 18 or Debian 9 will die with "Kernel too > old" errors. I think the idea generally is

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-29 Thread Goetz, Patrick G
On 8/29/19 9:38 AM, Jarno van der Kolk wrote: > Here's an example on how to do so from the Compute Canada docs: > https://docs.computecanada.ca/wiki/GNU_Parallel#Running_on_Multiple_Nodes > [name@server ~]$ parallel --jobs 32 --sshloginfile ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-29 Thread Goetz, Patrick G
On 8/27/19 11:47 AM, Brian Andrus wrote: > 1) If you can, either use xargs or parallel to do the forking so you can > limit the number of simultaneous submissions > Sorry if this is a naive question, but I'm not following how you would use parallel with Slurm (unless you're talking about using

Re: [slurm-users] slurm-19.05 link error

2019-07-24 Thread Goetz, Patrick G
Why are you searching your miniconda environment for library files? All the HDF5 stuff from the Arch package gets installed in standard library locations. On 7/23/19 9:47 PM, Weiguang Chen wrote: > Thanks for you fast response. > I have installed hdf5 by pacman -S hdf5 > (base) [zznu@archlinux

Re: [slurm-users] Problem with sbatch

2019-07-08 Thread Goetz, Patrick G
Sudo is more flexible than than; for example you can just give the slurmduser sudo access to the chown command and nothing else. On 7/8/19 11:37 AM, Daniel Torregrosa wrote: > You are right. The critical part I was missing is that chown does not > work without sudo. > > I assume this can be fix

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-21 Thread Goetz, Patrick G
There are 2 kinds of system admins: can do and can't do. You're a can do; his are can't do. On 3/21/19 10:26 AM, Prentice Bisbal wrote: > > On 3/20/19 1:58 PM, Christopher Samuel wrote: >> On 3/20/19 4:20 AM, Frava wrote: >> >>> Hi Chris, thank you for the reply. >>> The team that manages that

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Goetz, Patrick G
But rsync -a will only help you if people are using identical or at least overlapping data sets? And you don't need rsync to prune out old files. On 2/26/19 1:53 AM, Janne Blomqvist wrote: > On 22/02/2019 18.50, Will Dennis wrote: >> Hi folks, >> >> Not directly Slurm-related, but... We have a

Re: [slurm-users] About x11 support

2018-11-26 Thread Goetz, Patrick G
I'm a little confused about how this would work. For example, where does slurmctld run? And if on each submit host, why aren't the control daemons stepping all over each other? On 11/22/18 6:38 AM, Stu Midgley wrote: > indeed. > > All our workstations are submit hosts and in the queue, so peo