[slurm-users] Hung tasks and high load when cancelling jobs

2018-05-02 Thread Brendan Moloney
Hi, Sometimes when jobs are cancelled I see a spike in system load and hung task errors. It appears to be related to NFS and cgroups. The slurmstepd process gets hung cleaning up cgroups: INFO: task slurmstepd:11222 blocked for more than 120 seconds. Not tainted 4.4.0-119-generic #143-Ubun

[slurm-users] Built in X11 forwarding in 17.11 won't work on local displays

2018-05-09 Thread Brendan Moloney
Hi, We recently upgraded to 17.11, and I was trying to setup the new integrated X11 forwarding instead of using the spank plugin. Initially I was testing with an SSH session into our login node and things seemed fine. Then I switched to using X2Go to connect to the login node and it broke. The

Re: [slurm-users] About x11 support

2018-11-26 Thread Brendan Moloney
I posted about the local display issue a while back ("Built in X11 forwarding in 17.11 won't work on local displays"). I agree that having some local managed workstations that can also act as submit nodes is not so uncommon. However we also ran into this on our official "login nodes" because we us