Hi Brian,
Presumably the users' home directory is NFS automounted using autofs, and
therefore it doesn't exist when the job starts.
The job_container/tmpfs plugin ought to work correctly with autofs, but
maybe this is still broken in 23.02?
/Ole
On 3/6/23 21:06, Brian Andrus wrote:
That looks like the users' home directory doesn't exist on the node.
If you are not using a shared home for the nodes, your onboarding process
should be looked at to ensure it can handle any issues that may arise.
If you are using a shared home, you should do the above and have the node
ensure the shared filesystems are mounted before allowing jobs.
-Brian Andrus
On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
Hi all
Seems there still are some issues with the autofs - job_container/tmpfs
functionality in Slurm 23.02.
If the required directories aren't mounted on the allocated node(s)
before jobstart, we get:
slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
directory: going to /tmp instead
slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
directory: going to /tmp instead
An easy workaround however, is to include this line in the slurm prolog
on the slurmd -nodes:
/usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
-but there might exist a better way to solve the problem?