Re: [slurm-users] Suddenly getting "Invalid node name specified" when attempting srun/sbatch

2019-07-10 Thread Chris Samuel
On 10/7/19 6:10 pm, Benjamin Wong wrote: slurmctld: error: slurm_auth_get_host: Lookup failed: Unknown host slurmctld: error: REQUEST_RESOURCE_ALLOCATE lacks alloc_node from uid=19015 This is the slurmctld rejecting the RPC to submit the job due to not liking the node you are submitting

[slurm-users] Suddenly getting "Invalid node name specified" when attempting srun/sbatch

2019-07-10 Thread Benjamin Wong
My server was having issues yesterday so I rebooted it last night but slurm has not been working properly ever since the reboot. I've rebooted other machines too in the same time and they work completely fine but this one in particular cannot submit any srun/sbatch commands due to a "invalid node

Re: [slurm-users] Specify number of cores only?

2019-07-10 Thread Mark Hahn
Is there a way to instruct SBATCH to submit a job with a certain number of cores without specifying anything else? I don?t care which nodes or sockets they run on. They would only use on thread per core. not just --ntasks? regards, mark hahn.

Re: [slurm-users] Specify number of cores only?

2019-07-10 Thread Renfro, Michael
ntasks=N as an argument to sbatch or srun? Should work as long as you don’t have exclusive node settings. From our setup: [renfro@login ~]$ hpcshell --ntasks=16 # hpcshell is a shell function for 'srun --partition=interactive $@ --pty bash -i' [renfro@gpunode001(job 202002) ~]$ srun hostname |

[slurm-users] Specify number of cores only?

2019-07-10 Thread HELLMERS Joe
Is there a way to instruct SBATCH to submit a job with a certain number of cores without specifying anything else? I don’t care which nodes or sockets they run on. They would only use on thread per core. Joe Hellmers Pipeline Pilot/ScienceCloud SW Dev Office: +1 858-799-5412 Mobile: +

[slurm-users] Slurm versions 19.05.1 and 18.08.8 are now available (CVE-2019-12838)

2019-07-10 Thread Tim Wickberg
Slurm versions 19.05.1 and 18.08.8 are now available, and include a series of recent bug fixes, as well as a fix for a security vulnerability (CVE-2019-12838) related to the 'sacctmgr archive load' functionality. While fixes are only available for the currently supported 19.05 and 18.08

[slurm-users] Config: behavior of default CPU number per GPU (DefCpuPerGPU)

2019-07-10 Thread GD
Hi, I have a question regarding the default number of CPUs allocated per GPU (`DefCpuPerGPU` in `slurm.conf`). I first mention that the doc refers to `DefCpusPerGPU` (with an 's' at Cpu) but slurmctld only understand `DefCpuPerGPU` (c.f. https://bugs.schedmd.com/show_bug.cgi?id=7203). So here is

Re: [slurm-users] [pmix] [Cross post - Slurm, PMIx, UCX] Using srun with SLURM_PMIX_DIRECT_CONN_UCX=true fails with input/output error

2019-07-10 Thread Daniel Letai
Thank you Artem, I've made a mistake while typing the mail, in all cases it was 'OMPI_MCA_pml=ucx' and not as written. When I went over the mail before sending, I must have erroneously 'fixed' it for some reason. Best regards,

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-10 Thread Edward Ned Harvey (slurm)
> From: slurm-users On Behalf Of > Andy Georges > Sent: Wednesday, July 10, 2019 3:57 AM > > EnforcePartLimits=YES Hmmm It's already yes... I assume it's not case-sensitive...

[slurm-users] Problem: requesting specific GPU GRES type is ignored in job submission

2019-07-10 Thread GD
Hi, I have an issue with GPU request in job submission. I have a single computing node (128 cores, 3GPUs) which also runs the Slurm server. When I try to submit a job requesting a specific GPU type corresponding to a GTX 1080 (GPU id 2 on my machine), the job is not assigned to the requested GPU

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-10 Thread Andy Georges
Hi, > So here's something funny. One user submitted a job that requested 60 cpu's > and 40M of memory. Our largest nodes in that partition have 72 cpu's and > 256G of memory. So when a user requests 400G of ram, what would be good > behavior? I would like to see slurm reject the job, "job