Also, maybe this has possibly been fixed already?

Am not seeing this happen on our Slurm 17.x test cluster, but it appears on our 
cluster using 15.x.

> On Jul 5, 2017, at 10:37 AM, Craig Yoshioka <yoshi...@ohsu.edu> wrote:
> 
> Hi, 
> 
> I posted this a while back but didn’t get any responses.  I prefer using 
> `srun` to invoke commands on our cluster because it is way more convenient 
> then writing wrappers for sbatch for running single process jobs (no multiple 
> steps).  The problem is that if I submit to many srun jobs, the head node 
> starts running out of socket resources (or other?) and I start getting 
> timeouts and some of the srun processes start using 100% CPU.  
> 
> I’ve tried redirecting all I/O to prevent use of sockets, etc., but still see 
> this problem.  Can anyone suggest an alternative approach or fix?  Something 
> that doesn’t require I write shell wrappers, but also doesn’t keep a running 
> process going on the head node?
> 
> Thanks,
> -Craig

Reply via email to