2016-05-25 15:34 GMT+02:00 Robbert Eggermont <r.eggerm...@tudelft.nl>:

>
> Hi Mike,
>
> On 25-05-16 13:22, Mike Johnson wrote:
>
>> I am in an environment that uses NFSv4, which obviously needs
>> user credentials to grant access to filesystems.  Has anyone else
>> tackled the issue of unattended batch jobs successfully?  I'm aware of
>> AUKS.
>>
>
> We are using Kerberised NFS4 with Slurm 15.08 and AUKS successfully. Users
> generally don't have to do anything special, and some probably forget about
> the lifetime of Kerberos tickets.
>
> It can sometimes take a day or more before jobs are run and we allow
> maximum walltimes longer than the Kerberos renewable lifetime so it's
> possible for tickets to expire before the jobs finish. We advise users to
> do a fresh login to the head nodes before submitting jobs. When users
> regularly submit jobs, the ticket stored in AUKS will have enough remaining
> lifetime to bridge a couple of days (or a long weekend). For longer jobs,
> they may have to run 'auks -a' to update the ticket until the job finishes.
>
> The only thing that could be improved is the feedback when a ticket has
> expired. Since Slurm jobs will no longer be able to write to the output
> file all further output and error messages will simply be lost. For jobs
> that get started the only clue is that Slurm immediately reports the job as
> failed but no output file is created.
>
>
Hi Robbert, what we do to solve this problem is adding a section in the
Slurmctld prolog that check that the user associated to the job to start
has a valid credential in the auksd daemon, otherwise we update the job
with a comment indicating that no kerberos token is available and then
suspend the job.

HTH
Matthieu


All in all it works well.
>
> Regards,
>
> Robbert
>
> --
> Robbert Eggermont                                  Intelligent Systems
> r.eggerm...@tudelft.nl         Electr.Eng., Mathematics & Comp.Science
> +31 15 27 83234                         Delft University of Technology
>

Reply via email to