Hi;

We use a bash script to watch and kill users' processes, if they exceed the our cpu and memory limits. Also this solution ensures total usage of cpu or memory can not exceed because of a lot of well behaved users as well as a bad user:

https://github.com/mercanca/kill_for_loginnode.sh

Ahmet M.


20.05.2021 15:40 tarihinde Timo Rothenpieler yazdı:
On 24.04.2021 04:37, Cristóbal Navarro wrote:
Hi Community,
I have a set of users still not so familiar with slurm, and yesterday they bypassed srun/sbatch and just ran their CPU program directly on the head/login node thinking it would still run on the compute node. I am aware that I will need to teach them some basic usage, but in the meanwhile, how have you solved this type of user-behavior problem? Is there a preffered way to restrict the master/login resources, or actions,  to the regular users ?

many thanks in advance
--
Cristóbal A. Navarro

I just put a drop-in config file for systemd into
/etc/systemd/system/user-.slice.d/user-limits.conf

[Slice]
CPUQuota=800%
MemoryHigh=48G
MemoryMax=56G
MemorySwapMax=0

Accompanied by another drop-in that resets all those limits for root.

This enforces that no single user can use up all CPUs (limited to 8 Hyperthreads) and RAM, and can't cause the system to swap.

Other than that, we leave it to the users due diligence to not trash up the login nodes, which so far worked fine. They occasionally compile stuff on the login nodes in preparation of runs, so I don't want to limit them too much.


Reply via email to