Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Ole Holm Nielsen
On 25-05-2021 18:07, Loris Bennett wrote: PS Am I wrong to be surprised that this is something one needs to roll oneself? It seems to me that most clusters would want to implement something similar. Is that incorrect? If not, are people doing something else? Or did some vendor setting things

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Ole Holm Nielsen
On 25-05-2021 19:03, Patrick Goetz wrote: On 5/25/21 11:07 AM, Loris Bennett wrote: PS Am I wrong to be surprised that this is something one needs to roll oneself?  It seems to me that most clusters would want to implement something similar.  Is that incorrect?  If not, are people doing somethin

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Michael Jennings
On Tuesday, 25 May 2021, at 14:09:54 (+0200), Loris Bennett wrote: > I think my main problem is that I expect logging in to a node with a job > to work with pam_slurm_adopt but without any SSH keys. My assumption > was that MUNGE takes care of the authentication, since users' jobs start > on node

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Max Voit
On Tue, 25 May 2021 14:09:54 +0200 "Loris Bennett" wrote: > to work with pam_slurm_adopt but without any SSH keys. My assumption > was that MUNGE takes care of the authentication, since users' jobs > start on nodes with the need for keys. > > Can someone confirm that this expectation is wrong a

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Patrick Goetz
On 5/25/21 11:07 AM, Loris Bennett wrote: PS Am I wrong to be surprised that this is something one needs to roll oneself? It seems to me that most clusters would want to implement something similar. Is that incorrect? If not, are people doing something else? Or did some vendor setting things

[slurm-users] Cgroup file write content error in 20.11.7

2021-05-25 Thread smmzkd
Hi all, I have upgrade my cluster to 20.11.7 version.However I have found my cgroup seems to be invalid.In my log files I can see : [2021-05-25T20:21:44.185] [18.0] debug: xcgroup.c:1366: _file_write_content: safe_write (11 of 11) failed: Operation not permitted [2021-05-25T20:21:44.185] [18.0

[slurm-users] Cgroup file write content error in 20.11.7

2021-05-25 Thread smmzkd
Hi all, I have upgrade my cluster to 20.11.7 version.However I have found my cgroup seems to be invalid.In my log files I can see : [2021-05-25T20:21:44.185] [18.0] debug: xcgroup.c:1366: _file_write_content: safe_write (11 of 11) failed: Operation not permitted [2021-05-25T20:21:44.185] [18.0

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Tina Friedrich
...I really didn't want to wade in on this, but why not set up host based ssh? It's not exactly as if passphraseless keys give better security? Tina On 25/05/2021 17:23, Brian Andrus wrote: Your mistake is that munge has nothing to do with sshd, which is the daemon you are connecting to. It ca

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Brian Andrus
Your mistake is that munge has nothing to do with sshd, which is the daemon you are connecting to. It can use PAM (hence the ability to use pam_slurm_adopt), but munge has no pam integration that I am aware of. As far as your /etc/skel bits, that is something that is done when a user's home is

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Loris Bennett
Hi Lloyd, Lloyd Brown writes: > We had something similar happen, when we migrated away from a Rocks-based > cluster.  We used a script like the one attached, in /etc/profile.d, which was > modeled heavily by something similar in Rocks. > > You might need to adapt it a bit for your situation, but

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Lloyd Brown
We had something similar happen, when we migrated away from a Rocks-based cluster.  We used a script like the one attached, in /etc/profile.d, which was modeled heavily by something similar in Rocks. You might need to adapt it a bit for your situation, but otherwise it's pretty straightforward

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Loris Bennett
Hi Ole, Thanks for the links. I have discovered that the users whose /home directories were migrated from our previous cluster all seem to have a pair of keys which were created along with files like '~/.bash_profile'. Users who have been set up on the new cluster don't have these files. Is the

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Ole Holm Nielsen
Hi Loris, I think you need, as pointed out by others, either of: * SSH keys, see https://wiki.fysik.dtu.dk/niflheim/SLURM#ssh-keys-for-password-less-access-to-cluster-nodes * SSH host-base authentication, see https://wiki.fysik.dtu.dk/niflheim/SLURM#host-based-authentication /Ole On 5/25/

Re: [slurm-users] Drain node from TaskProlog / TaskEpilog

2021-05-25 Thread Mark Dixon
Thanks to everyone for their help, much appreciated. Seems to confirm that things would be much easier if I could just figure out a way to detect the issue from the prolog/epilog, rather than the taskprolog/taskepilog! All the best, Mark On Mon, 24 May 2021, Brian Andrus wrote: [EXTERNAL

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Loris Bennett
Hi everyone, Thanks for all the replies. I think my main problem is that I expect logging in to a node with a job to work with pam_slurm_adopt but without any SSH keys. My assumption was that MUNGE takes care of the authentication, since users' jobs start on nodes with the need for keys. Can so