Re: [slurm-users] Forward arrows to stdin through srun?

2019-06-27 Thread Brian Andrus
I think you need a pty instead of just running bash... try: srun --pty bash Or get specific on what resources you need, eg: srun --nodes=1 --exclusive --pty bash Brian Andrus On 6/27/2019 2:11 PM, Micael Carvalho wrote: Hello there, I am having trouble with arrow keys in srun. Example of

[slurm-users] Forward arrows to stdin through srun?

2019-06-27 Thread Micael Carvalho
Hello there, I am having trouble with arrow keys in srun. Example of command: $ srun -u bash After running this command, I get an interactive session. If I type anything there (like "test", "ls", or anything else), and then press the left arrow (<-) to go back and edit something in the middle,

[slurm-users] getting closer

2019-06-27 Thread Valerio Bellizzomi
The nodes are now communicating however when I run the command srun -w compute02 /bin/ls it remains stuck and there is no output on the submit machine. on the compute02 there is a Communication error and Timeout. the network ports 6817 and 6818 are open.

[slurm-users] Accounting details not seen

2019-06-27 Thread Calvin Dodge
Most of the fields seen when I run "sacct" on a finished job are blank, like the TRES fields, VM, etc. I've set up slurmdbd, and all appears to be working OK. I've searched a fair bit, but haven't found any clues to getting those fields populated. Can someone provide me with such a clue?

Re: [slurm-users] gpu count

2019-06-27 Thread Valerio Bellizzomi
On Thu, 2019-06-27 at 15:50 +0200, Marcus Boden wrote: > Hi, > > this is usually due to a misconfiguration in your gres.conf (at least it > was for me). Can you show your gres.conf? I have revised the configuration, it needed File=... parameter. > Best, > Marcus > > On 19-06-27 15:33,

Re: [slurm-users] gpu count

2019-06-27 Thread Marcus Boden
Hi, this is usually due to a misconfiguration in your gres.conf (at least it was for me). Can you show your gres.conf? Best, Marcus On 19-06-27 15:33, Valerio Bellizzomi wrote: > hello, my node has 2 gpus so I have specified gres=gpus:2 but the > scontrol show node displays this: > >

Re: [slurm-users] gpu count

2019-06-27 Thread Eli V
gres has to be specified in both slurm.conf and gres.conf and gres.conf must be present on the node with the gres. I keep a single cluster wide gres.conf and copy it to all nodes just like slurm.conf. Also, after adding a new gres I think both the slurmctld and the slurmd needs to be restarted.

Re: [slurm-users] gpu count

2019-06-27 Thread Valerio Bellizzomi
On Thu, 2019-06-27 at 15:33 +0200, Valerio Bellizzomi wrote: > hello, my node has 2 gpus so I have specified gres=gpus:2 but the > scontrol show node displays this: > > State=IDLE+DRAIN > Reason=gres/gpus count too low (1 < 2) Also, the node is repeating a debug message: debug2: got this type

[slurm-users] gpu count

2019-06-27 Thread Valerio Bellizzomi
hello, my node has 2 gpus so I have specified gres=gpus:2 but the scontrol show node displays this: State=IDLE+DRAIN Reason=gres/gpus count too low (1 < 2)

[slurm-users] Trouble disabling core specialization

2019-06-27 Thread Guertin, David S.
Hello all, I'm trying to turn off core specialization in my cluster by setting CoreSpecCount=0, but checking with scontrol does not show my changes. If I set CoreSpec=1 or CoreSpecCount=2, or anything except 0, the changes are applied correctly. But when I set it to 0, no change is applied --

[slurm-users] Help with binding GPUs to sockets (NVlink, P2P)

2019-06-27 Thread Luis Altenkort
Hello everyone, I have several nodes with 2 sockets each and 4 GPUs per Socket (i.e. 8 GPUs per bode). I now want to tell SLURM that GPUs with device ID 0,1,2,3 are connected to socket 0 and GPUs 4,5,6,7 are connected to socket 1. I want to do this in order to be able to use the new command

[slurm-users] Help with binding GPUs to sockets (NVlink, P2P)

2019-06-27 Thread Luis Altenkort
Hello everyone, I have several nodes with 2 sockets each and 4 GPUs per Socket (i.e. 8 GPUs per bode). I now want to tell SLURM that GPUs with device ID 0,1,2,3 are connected to socket 0 and GPUs 4,5,6,7 are connected to socket 1. I want to do this in order to be able to use the new command