Re: [slurm-users] Forward arrows to stdin through srun?

2019-06-28 Thread Micael Carvalho
Awesome, Brian, worked like a charm. Thank you a lot! :) Le jeu. 27 juin 2019 à 20:06, Brian Andrus a écrit : > I think you need a pty instead of just running bash... > > try: > > srun --pty bash > > Or get specific on what resources you need, eg: > > srun --nodes=1 --exclusive --pty bash

Re: [slurm-users] Host not being a valid controller

2019-06-28 Thread Pär Lundö
Thank you, of course that is the problem! Best regards, Palle On 2019-06-28 15:48, Brian Andrus wrote: That is because your configuration only lists node0 as the host. You can only have one slurmctld running at a time, so you can either define node1 as a backuphost or not bother trying to

Re: [slurm-users] Host not being a valid controller

2019-06-28 Thread Marcus Wagner
So, could you show us your slurm.conf? Best Marcus On 6/28/19 2:31 PM, Pär Lundö wrote: Hi all slurm-experts! Recently I managed to configure and install a version 19.05 of Slurm in Ubuntu 18.04 and Ubuntu 18.10. I got it to run on my single node computer (a notebook) Feeling a bit

Re: [slurm-users] Host not being a valid controller

2019-06-28 Thread Brian Andrus
That is because your configuration only lists node0 as the host. You can only have one slurmctld running at a time, so you can either define node1 as a backuphost or not bother trying to start slurmctld on it. Brian Andrus On 6/28/2019 6:31 AM, Pär Lundö wrote: Hi all slurm-experts!

[slurm-users] Host not being a valid controller

2019-06-28 Thread Pär Lundö
Hi all slurm-experts! Recently I managed to configure and install a version 19.05 of Slurm in Ubuntu 18.04 and Ubuntu 18.10. I got it to run on my single node computer (a notebook) Feeling a bit comfortable with this setup I tried to extrapolate this to an additional computer, say node1,

Re: [slurm-users] Help with binding GPUs to sockets (NVlink, P2P)

2019-06-28 Thread Luis Altenkort
Hi, thanks for the answer. We actually had this already set up correctly, I simply forgot to add #SBATCH --sockets-per-node=1 to my script. Now --gpus-per-socket works! Am 28.06.19 um 09:27 schrieb Daniel Vecerka: Hi,  I'm not sure how it works in 19.0.5, but with 18.x  it's possible to

Re: [slurm-users] getting closer

2019-06-28 Thread Ole Holm Nielsen
On 6/28/19 9:57 AM, Valerio Bellizzomi wrote: On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote: On 6/28/19 9:18 AM, Valerio Bellizzomi wrote: On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote: On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: The nodes are now

Re: [slurm-users] getting closer

2019-06-28 Thread Valerio Bellizzomi
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote: > On 6/28/19 9:18 AM, Valerio Bellizzomi wrote: > > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote: > >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: > >>> The nodes are now communicating however when I run the

Re: [slurm-users] getting closer

2019-06-28 Thread Valerio Bellizzomi
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote: > On 6/28/19 9:18 AM, Valerio Bellizzomi wrote: > > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote: > >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: > >>> The nodes are now communicating however when I run the

Re: [slurm-users] getting closer

2019-06-28 Thread Ole Holm Nielsen
On 6/28/19 9:18 AM, Valerio Bellizzomi wrote: On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote: On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: The nodes are now communicating however when I run the command srun -w compute02 /bin/ls it remains stuck and there is no

Re: [slurm-users] Help with binding GPUs to sockets (NVlink, P2P)

2019-06-28 Thread Daniel Vecerka
Hi,  I'm not sure how it works in 19.0.5, but with 18.x  it's possible to specify CPU affinity in the file  /etc/slurm/gres.conf Name=gpu Type=v100 File=/dev/nvidia0 CPUs=0-17,36-53 Name=gpu Type=v100 File=/dev/nvidia1 CPUs=0-17,36-53 Name=gpu Type=v100 File=/dev/nvidia2 CPUs=18-35,54-71

Re: [slurm-users] getting closer

2019-06-28 Thread Valerio Bellizzomi
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote: > On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: > > The nodes are now communicating however when I run the command > > > > srun -w compute02 /bin/ls > > > > it remains stuck and there is no output on the submit machine.

Re: [slurm-users] getting closer

2019-06-28 Thread Valerio Bellizzomi
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote: > The nodes are now communicating however when I run the command > > srun -w compute02 /bin/ls > > it remains stuck and there is no output on the submit machine. > > on the compute02 there is a Communication error and Timeout. > >