Hi Bjørn-Helge,

On 4/16/24 12:08, Bjørn-Helge Mevik via slurm-users wrote:
Ole Holm Nielsen via slurm-users <slurm-users@lists.schedmd.com> writes:

Therefore I believe that the root cause of the present issue is user
applications opening a lot of files on our 96-core nodes, and we need
to increase fs.file-max.

You could also set a limit per user, for instance in
/etc/security/limits.d/.  Then users would be blocked from opening
unreasonably many files.  One could use this to find which applications
are responsible, and try to get them fixed.

That sounds interesting, but which limit might affect the kernel's fs.file-max? For example, a user already has a narrow limit:

ulimit -n
1024

whereas the permitted number of user processes is a lot higher:

ulimit -u
3092846

I'm not sure how the number 3092846 got set, since it's not defined in /etc/security/limits.conf. The "ulimit -u" varies quite a bit among our compute nodes, so which dynamic service might affect the limits?

Perhaps there is a recommendation for defining nproc in /etc/security/limits.conf on compute nodes?

Thanks,
Ole

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to