Hi Ward,

Thanks for replying. I tried these but the error is exactly the same
(everything
under "/shared" has permissions 777 and owned by "nobody:nogroup"):

/etc/slurm/slurm.conf
JobContainerType=job_container/tmpfs
Prolog=/shared/SlurmScripts/prejob
PrologFlags=contain

/etc/slurm/job_container.conf
#
AutoBasePath=true
BasePath=/shared/BasePath

/shared/SlurmScripts/prejob
#!/usr/bin/env bash
MY_XDG_RUNTIME_DIR=/shared/SlurmXDG
mkdir -p $MY_XDG_RUNTIME_DIR
echo "export XDG_RUNTIME_DIR=$MY_XDG_RUNTIME_DIR"



On Wed, May 15, 2024 at 2:28 PM Ward Poelmans via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> This is systemd, not slurm. We've also seen it being created and removed.
> As far as I understood something about the session that systemd clean up.
> We've worked around by adding this to the prolog:
>
> MY_XDG_RUNTIME_DIR=/dev/shm/${USER}
> mkdir -p $MY_XDG_RUNTIME_DIR
> echo "export XDG_RUNTIME_DIR=$MY_XDG_RUNTIME_DIR"
>
> (in combination with private tmpfs per job).
>
> Ward
>
> On 15/05/2024 10:14, Arnuld via slurm-users wrote:
> > I am using the latest slurm. It  runs fine for scripts. But if I give it
> a container then it kills it as soon as I submit the job. Is slurm cleaning
> up the $XDG_RUNTIME_DIR before it should?  This is the log:
> >
> > [2024-05-15T08:00:35.143] [90.0] debug2: _generate_patterns: StepId=90.0
> TaskId=-1
> > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command
> argv[0]=/bin/sh
> > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command
> argv[1]=-c
> > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command
> argv[2]=crun --rootless=true --root=/run/user/1000/ state
> slurm2.acog.90.0.-1
> > [2024-05-15T08:00:35.167] [90.0] debug:  _get_container_state:
> RunTimeQuery rc:256 output:error opening file
> `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory
> >
> > [2024-05-15T08:00:35.167] [90.0] error: _get_container_state:
> RunTimeQuery failed rc:256 output:error opening file
> `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory
> >
> > [2024-05-15T08:00:35.167] [90.0] debug:  container already dead
> > [2024-05-15T08:00:35.167] [90.0] debug3: _generate_spooldir: task:0
> pattern:%m/oci-job%j-%s/task-%t/ path:/var/spool/slurmd/oci-job90-0/task-0/
> > [2024-05-15T08:00:35.167] [90.0] debug2: _generate_patterns: StepId=90.0
> TaskId=0
> > [2024-05-15T08:00:35.168] [90.0] debug3: _generate_spooldir: task:-1
> pattern:%m/oci-job%j-%s/ path:/var/spool/slurmd/oci-job90-0/
> > [2024-05-15T08:00:35.168] [90.0] stepd_cleanup: done with step
> (rc[0x100]:Unknown error 256, cleanup_rc[0x0]:No error)
> > [2024-05-15T08:00:35.275] debug3: in the service_connection
> > [2024-05-15T08:00:35.278] debug2: Start processing RPC:
> REQUEST_TERMINATE_JOB
> > [2024-05-15T08:00:35.278] debug2: Processing RPC: REQUEST_TERMINATE_JOB
> > [2024-05-15T08:00:35.278] debug:  _rpc_terminate_job: uid = 64030
> JobId=90
> > [2024-05-15T08:00:35.278] debug:  credential for job 90 revoked
> >
> >
> >
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to