Véronique,
So that’s the culprit :
2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1
Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 Uptime=74
CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
For a reason you have to determine, when slurmd starts
Here is what I get:
-sh-4.1$ scontrol show config|grep TmpFS
TmpFS = /local/scratch
Véronique
--
Véronique Legrand
IT engineer – scientific calculation & software development
https://research.pasteur.fr/en/member/veronique-legrand/
Cluster and computing group
IT department
I think Uwe was on the right track.
It looks to me like the problem node is somehow thinking
TmpFS=/tmp rather than /local/scratch.
That seems to be consistent with what is being reported
(TmpDisk = 500 ).
I would check the slurm.conf/scontrol show config output
on the problem node and
Pierre-Marie,
Here is what I have in slurmd.log on tars-XXX
-sh-4.1$ sudo cat slurmd.log
2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message aggregation
enabled: WindowMsgs=24, WindowTime=200
2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency setting
not
writes:
> Thanks!
>
> I'm probably missing something basic, but I don't see any difference by
> applying the changes you
> suggest - the signals does still not seem to be effectuated until after the
> grace time is over.
I could remember the details wrong. You could write a
Véronique,
This not what I expected, I was thinking slurmd -C would return TmpDisk=204000
or more probably 129186 as seen in slurmctld log.
I suppose that you already checked slurmd logs on tars-XXX ?
Regards,
Pierre-Marie Le Biot
From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
Hello Uwe,
This is already done. Please have a look at my first email. In slurm.conf I
have :
# COMPUTES
TmpFS=/local/scratch
Regards,
Véronique
--
Véronique Legrand
IT engineer – scientific calculation & software development
https://research.pasteur.fr/en/member/veronique-legrand/
Cluster
Hi,
see the man page for slurm.conf:
TmpFS
Fully qualified pathname of the file system available to user jobs for
temporary storage. This parameter is used in
establishing a node's TmpDisk space. The default value is "/tmp".
So it is using /tmp. You need to change that parameter to
Hello Pierre-Marie,
First, thank you for your hint.
I just tried.
>slurmd -C
NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6
ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
UpTime=0-20:50:54
The value for TmpDisk is erroneous. I do not know what can be the cause of this
Hi Véronique,
Did you check the result of slurmd -C on tars-XXX ?
Regards,
Pierre-Marie Le Biot
From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
Sent: Tuesday, October 10, 2017 12:02 PM
To: slurm-dev
Subject: [slurm-dev] Node always going to DRAIN state with
Thanks!
I'm probably missing something basic, but I don't see any difference by
applying the changes you suggest - the signals does still not seem to be
effectuated until after the grace time is over.
Could it be something wrong with how my partitions are defined?
PartitionName=cheap
Hello,
I have a problem with 1 node in our cluster. It is exactly as all the other
nodes (200 GB of temporary storage)
Here is what I have in slurm.conf:
# COMPUTES
TmpFS=/local/scratch
# NODES
GresTypes=disk,gpu
ReturnToService=2
NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0
Arghh,
On 10/10/2017 10:32 AM, Marcus Wagner wrote:
Hello, everyone.
I'm also fairly new to slurm, still in a conceptual rather than a test
or productive phase. Currently I am still trying to find out where to
create which files and directories, on the host or in a network
directory.
I'm
Hello, everyone.
I'm also fairly new to slurm, still in a conceptual rather than a test
or productive phase. Currently I am still trying to find out where to
create which files and directories, on the host or in a network directory.
I'm a little confused about the description in the manpage
Hello everybody,
On 10/10/17 8:25 AM, Marcus Wagner wrote:
For a quick view, manually starting the controller
slurmctld -D -vvv
good advice,
for beginners (or a tired help-seeker) a hint to "-f" might be necessary.
Without the current configuration running the central management daemon
is
Hello,
sorry for answering late…
Thank you to everybody!
I will try to increase NFS performance.
Best regards,
Brigitte Selch
Von: John Hearns [mailto:hear...@googlemail.com]
Gesendet: Donnerstag, 28. September 2017 16:45
An: slurm-dev
Betreff: [slurm-dev] Re:
Dear Lachlan,
thank you very much for your suggestions. I will try to experiment a little
bit with those settings.
Have you a nice day,
Emanuele
2017-10-10 0:02 GMT+02:00 Lachlan Musicman :
> On 9 October 2017 at 22:06, cyberseawolf . wrote:
>
>> Hello
For a quick view, manually starting the controller
slurmctld -D -vvv
might also help.
Best
Marcus
On 10/10/2017 01:41 AM, Christopher Samuel wrote:
On 10/10/17 07:21, Suman Sirimulla wrote:
We have installed and configured slurm on our cluster, but unable to
start the slurmctld daemon. We
18 matches
Mail list logo