2016-05-25 15:34 GMT+02:00 Robbert Eggermont <r.eggerm...@tudelft.nl>:
> > Hi Mike, > > On 25-05-16 13:22, Mike Johnson wrote: > >> I am in an environment that uses NFSv4, which obviously needs >> user credentials to grant access to filesystems. Has anyone else >> tackled the issue of unattended batch jobs successfully? I'm aware of >> AUKS. >> > > We are using Kerberised NFS4 with Slurm 15.08 and AUKS successfully. Users > generally don't have to do anything special, and some probably forget about > the lifetime of Kerberos tickets. > > It can sometimes take a day or more before jobs are run and we allow > maximum walltimes longer than the Kerberos renewable lifetime so it's > possible for tickets to expire before the jobs finish. We advise users to > do a fresh login to the head nodes before submitting jobs. When users > regularly submit jobs, the ticket stored in AUKS will have enough remaining > lifetime to bridge a couple of days (or a long weekend). For longer jobs, > they may have to run 'auks -a' to update the ticket until the job finishes. > > The only thing that could be improved is the feedback when a ticket has > expired. Since Slurm jobs will no longer be able to write to the output > file all further output and error messages will simply be lost. For jobs > that get started the only clue is that Slurm immediately reports the job as > failed but no output file is created. > > Hi Robbert, what we do to solve this problem is adding a section in the Slurmctld prolog that check that the user associated to the job to start has a valid credential in the auksd daemon, otherwise we update the job with a comment indicating that no kerberos token is available and then suspend the job. HTH Matthieu All in all it works well. > > Regards, > > Robbert > > -- > Robbert Eggermont Intelligent Systems > r.eggerm...@tudelft.nl Electr.Eng., Mathematics & Comp.Science > +31 15 27 83234 Delft University of Technology >