Hello,
Running slurm 15.08.11 on FreeBSD 10.3-RELEASE we're seeing issues with sacct
not reporting CPU information correctly (or I am misunderstanding what should be
reported). Where should we start digging to figure out why NCPUS and AllocCPUS
are always 0?
Regards,
Joseph
% sacct
Hi Thekla,
maybe it is not a real bug of slurmd but a caching issue/race. Do you
have enabled the CacheGroups option of Slurm? If yes can you try to set
CacheGroups=0 and then restart Slurm daemons and tell us if the behavior
has changed?
Also I would like to see the groups that were set
With strace you can see what the id command is doing in both cases:
1) When you call without arguments internally is calling getuid and
getgid and returns the groups that were set for the current process
(bash in this case). You can see the groups that was set in current
shell with the
Hi Valanti,
Thanks a lot for your quick reply :)
When getting interactive access on a node through SLURM and type the id
command with and without arguments the output is different.
Please see below:
[thekla@node05 ~]$ id
uid=2017(thekla) gid=5000(cstrc) groups=5000(cstrc)
[thekla@node05 ~]$
OK Thekla now I understand better what's going on.
It really seems to be a problem of Slurm. More specifically, slurmd on
the compute nodes which is running as root is changing to the user's uid
before it starts the application and during that step it should set the
groups (secondary also)
Hi Mike,
On 25-05-16 13:22, Mike Johnson wrote:
I am in an environment that uses NFSv4, which obviously needs
user credentials to grant access to filesystems. Has anyone else
tackled the issue of unattended batch jobs successfully? I'm aware of
AUKS.
We are using Kerberised NFS4 with Slurm
They've been doing things like this at CERN for donkeys years - with the Andrew
File System in the past.
Look for Ticket Granting Tickets. Sorry - my memory is getting hazy.
-Original Message-
From: Mike Johnson [mailto:m.d.john...@durhamonline.org]
Sent: 25 May 2016 12:22
To:
Hi Valanti! :)
We are using nslcd on the compute nodes.
We have indeed changed the default behavior/command of salloc but I
don't think that this is the issue because the same happens when we
submit jobs via sbatch. So I believe that this is not related to the new
command we are using.
Hi all,
I know this is a long-standing question, but thought it was worth
asking. I am in an environment that uses NFSv4, which obviously needs
user credentials to grant access to filesystems. Has anyone else
tackled the issue of unattended batch jobs successfully? I'm aware of
AUKS. Is
Hi Thekla! :)
For me it looks like it's a configuration issue of the client LDAP name
service on the compute nodes. Which service are you using? nslcd or
sssd? I can see that you have change the default behavior/command of
salloc and the command gives you a prompt on the compute node directly
10 matches
Mail list logo