Hi Brian,

Thanks, SELinux is neither in strict or targeted mode, I'm running SLURM on Debian Bullseye with SELinux and Apparmor disabled.

Thank you for your suggestion,

Le 08/04/2022 à 21:43, Brian Andrus a écrit :
Check selinux.

Run "getenforce" on the node, if it returns 1, try setting "setenforce 0"

Slurm doesn't play well if selinux is enabled.

Brian Andrus


On 4/8/2022 10:53 AM, Nicolas Greneche wrote:
Hi,

I have an issue with pam_slurm_adopt when I moved from 21.08.5 to 21.08.6. It no longer works.

When I log straight to the node with root account :

Apr  8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
Apr  8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from 172.16.0.3 port 50884 ssh2: ... Apr  8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)

Everything is OK.

I submit a very simple job, an infinite loop to keep the first compute node busy :

nicolas.greneche@magi3:~/test-bullseye/infinite$ cat infinite.slurm
#!/bin/bash
#SBATCH --job-name=infinite
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --nodes=1
srun infinite.sh

nicolas.greneche@magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
Submitted batch job 203

nicolas.greneche@magi3:~/test-bullseye/infinite$ squeue
             JOBID PARTITION     NAME     USER ST       TIME NODES NODELIST(REASON)
               203   COMPUTE infinite nicolas.  R       0:03 1 magi46

I have a job running on the node. When I try to log on the node with the same regular account :

nicolas.greneche@magi3:~/test-bullseye/infinite$ ssh magi46
Access denied by pam_slurm_adopt: you have no active jobs on this node
Connection closed by 172.16.0.46 port 22

In the auth.log, we can see that the job found (JOBID 203) is found but the PAM decides that I have no running job on node :

Apr  8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access denied for user `nicolas.greneche' from `172.16.0.3' Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2: _establish_config_source: using config_file=/run/slurm/conf/slurm.conf (cached) Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: slurm_conf_init: using config_file=/run/slurm/conf/slurm.conf Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading slurm.conf file: /run/slurm/conf/slurm.conf Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading cgroup.conf file /run/slurm/conf/cgroup.conf Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found StepId=203.batch
Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found StepId=203.0
Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access denied by pam_slurm_adopt: you have no active jobs on this node Apr  8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user nicolas.greneche by PAM account configuration [preauth]

I may have miss something, if you have some tips, I'll be delighted.

In appendices, I give you the configuration of sshd pam on compute nodes and the slurm.conf :

root@magi46:~# cat /etc/pam.d/sshd
@include common-auth
account    required     pam_nologin.so
account  required     pam_access.so
account  required     pam_slurm_adopt.so log_level=debug5

@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]     pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_keyinit.so force revoke

@include common-session
session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    optional     pam_mail.so standard noenv
session    required     pam_limits.so
session    required     pam_env.so
session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale session [success=ok ignore=ignore module_unknown=ignore default=bad]     pam_selinux.so open

@include common-password

root@slurmctld:~# cat /etc/slurm/slurm.conf
ClusterName=magi
ControlMachine=slurmctld
SlurmUser=slurm
AuthType=auth/munge

MailProg=/usr/bin/mail
SlurmdDebug=debug

StateSaveLocation=/var/slurm
SlurmdSpoolDir=/var/slurm
SlurmctldPidFile=/var/slurm/slurmctld.pid
SlurmdPidFile=/var/slurm/slurmd.pid
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldParameters=enable_configless

AccountingStorageHost=slurmctld
JobAcctGatherType=jobacct_gather/linux
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageEnforce=associations
JobRequeue=0
SlurmdTimeout=600

SelectType=select/cons_tres
SelectTypeParameters=CR_CPU

TmpFS=/scratch

GresTypes=gpu
PriorityType="priority/multifactor"

Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=92000 State=UNKNOWN Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=64000 State=UNKNOWN

PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE State=UP PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 MaxTime=INFINITE State=UP Default=YES

Thank you,



--
Nicolas Greneche
USPN
Support à la recherche / RSSI
https://www-magi.univ-paris13.fr

Reply via email to