Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-09 Thread Kota Tsuyuzaki
> -j -l` too. However, it seems to include any GPU index information > even in AllocGres and AllocTres columns. It DOES NOT seem to include any GPU index, I meant. Sorry. Best. 露崎 浩太 (Kota Tsuyuzaki) kota.tsuyuzaki...@hco.ntt.co.jp NTTソフトウェアイノベーションセ

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-09 Thread Michael Jennings
On Tuesday, 09 June 2020, at 21:27:27 (+0200), Ole Holm Nielsen wrote: > Thanks very much, this is really cool! I need to look into the > HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not > using srun). > > Presumably external access still needs to use SSH authorized keys?

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-09 Thread Kota Tsuyuzaki
> Using sacct you can find those information, try the below options and see if > that works. > > sacct -j --format=jobid,ReqTRES%50,ReqGres Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct -j -l` too. However, it seems

Re: [slurm-users] [External] Re: ssh-keys on compute nodes?

2020-06-09 Thread Prentice Bisbal
Host-based security is not considered as safe as user-based security, so should only be used in special cases. On 6/9/20 11:45 AM, Michael Jennings wrote: On Tuesday, 09 June 2020, at 12:43:34 (+0200), Ole Holm Nielsen wrote: in which case you need to set up SSH authorized_keys files for such

Re: [slurm-users] [External] Re: ssh-keys on compute nodes?

2020-06-09 Thread Ole Holm Nielsen
Hi Prentice, Could you kindly elaborate on this statement? Is host-based security safe inside a compute cluster compared to user-based SSH keys? Thanks, Ole On 09-06-2020 21:26, Prentice Bisbal wrote: Host-based security is not considered as safe as user-based security, so should only be u

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-09 Thread Ole Holm Nielsen
Hi Michael, Thanks very much, this is really cool! I need to look into the HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not using srun). Presumably external access still needs to use SSH authorized keys? Best regards, Ole On 09-06-2020 17:45, Michael Jennings wrote:

Re: [slurm-users] GUI application crash on first allocation, but runs fine on second allocation

2020-06-09 Thread Brian Andrus
Sounds like a race condition where slurmd is starting before the node is truly ready. You can try adding dependencies for slurmd so it will not start until some other needed service is running. The benefits of systemd :) Brian Andrus On 6/9/2020 10:53 AM, Dumont, Joey wrote: Hi, I am

[slurm-users] GUI application crash on first allocation, but runs fine on second allocation

2020-06-09 Thread Dumont, Joey
Hi, I am encountering a weird issue, and I'm not sure where it is coming from. I have setup a slurm-based cluster using AWS ParallelCluster. I have tweaked the slurm configuration to enable X forwarding by setting PrologFlags=X11. The ParallelCluster portion is relevant, as basically every ti

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-09 Thread Michael Jennings
On Tuesday, 09 June 2020, at 12:43:34 (+0200), Ole Holm Nielsen wrote: > in which case you need to set up SSH authorized_keys files for such > users. I'll admit that I didn't know about this until I came to LANL, but there's actually a much better alternative than having to create user key pairs

[slurm-users] configless DNS entries

2020-06-09 Thread Brian Andrus
All, Has anyone successfully implemented the DNS SRV records for configs? I am curious about where to put the SRV record (what domain/name) as we have more than one cluster in the same domain. Maybe that would not be supported. Cannot tell from the documentation at https://slurm.schedmd.com/

Re: [slurm-users] cluster reconfigure

2020-06-09 Thread Ole Holm Nielsen
On 6/9/20 12:12 PM, Steve Brasier wrote: Hi all, looking for some advice on the process to following when doing one of the reconfigurations which requires a slurm daemon restart (as listed in docs for "scontrol reconfigure"). When reconfiguring slurm.conf, make sure to propagate that file to a

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-09 Thread Ole Holm Nielsen
Hi Durai, I can only try to explain how I understand this: The "slurm" user runs only the slurmctld and slurmdbd central server daemons. On the compute nodes, the slurmd daemon runs as the root user so that it can start user tasks on behalf of normal users. The "slurm" user should *not* hav

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-09 Thread Durai Arasan
Hi, Can you please help me understand how the passwordless ssh works on SLURM? I was under the assumption that jobs/tasks are ultimately submitted by the "slurm" linux user and not by the linux user who wants to run jobs. Is this not correct? So is it not sufficient for only the "slurm" linux use

Re: [slurm-users] Intermittent problem at 32 CPUs

2020-06-09 Thread Diego Zuccato
Il 08/06/20 12:16, Diego Zuccato ha scritto: > I have another partition on these new nodes. 4 identical machines, new > installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual > thread). No problem running a job requiring 112 threads (on 4 nodes), > but can't run a single-node job with 5