[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie
Véronique, So that’s the culprit : 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) For a reason you have to determine, when slurmd starts

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND
Here is what I get: -sh-4.1$ scontrol show config|grep TmpFS TmpFS = /local/scratch Véronique -- Véronique Legrand IT engineer – scientific calculation & software development https://research.pasteur.fr/en/member/veronique-legrand/ Cluster and computing group IT department

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Thomas M. Payerle
I think Uwe was on the right track. It looks to me like the problem node is somehow thinking TmpFS=/tmp rather than /local/scratch. That seems to be consistent with what is being reported (TmpDisk = 500 ). I would check the slurm.conf/scontrol show config output on the problem node and

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND
Pierre-Marie, Here is what I have in slurmd.log on tars-XXX -sh-4.1$ sudo cat slurmd.log 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message aggregation enabled: WindowMsgs=24, WindowTime=200 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency setting not

[slurm-dev] Re: Preemtion and signals

2017-10-10 Thread Bjørn-Helge Mevik
writes: > Thanks! > > I'm probably missing something basic, but I don't see any difference by > applying the changes you > suggest - the signals does still not seem to be effectuated until after the > grace time is over. I could remember the details wrong. You could write a

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie
Véronique, This not what I expected, I was thinking slurmd -C would return TmpDisk=204000 or more probably 129186 as seen in slurmctld log. I suppose that you already checked slurmd logs on tars-XXX ? Regards, Pierre-Marie Le Biot From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND
Hello Uwe, This is already done. Please have a look at my first email. In slurm.conf I have : # COMPUTES TmpFS=/local/scratch Regards, Véronique -- Véronique Legrand IT engineer – scientific calculation & software development https://research.pasteur.fr/en/member/veronique-legrand/ Cluster

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Uwe Sauter
Hi, see the man page for slurm.conf: TmpFS Fully qualified pathname of the file system available to user jobs for temporary storage. This parameter is used in establishing a node's TmpDisk space. The default value is "/tmp". So it is using /tmp. You need to change that parameter to

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND
Hello Pierre-Marie, First, thank you for your hint. I just tried. >slurmd -C NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=258373 TmpDisk=500 UpTime=0-20:50:54 The value for TmpDisk is erroneous. I do not know what can be the cause of this

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie
Hi Véronique, Did you check the result of slurmd -C on tars-XXX ? Regards, Pierre-Marie Le Biot From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] Sent: Tuesday, October 10, 2017 12:02 PM To: slurm-dev Subject: [slurm-dev] Node always going to DRAIN state with

[slurm-dev] Re: Preemtion and signals

2017-10-10 Thread tegner
Thanks! I'm probably missing something basic, but I don't see any difference by applying the changes you suggest - the signals does still not seem to be effectuated until after the grace time is over. Could it be something wrong with how my partitions are defined? PartitionName=cheap

[slurm-dev] Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND
Hello, I have a problem with 1 node in our cluster. It is exactly as all the other nodes (200 GB of temporary storage) Here is what I have in slurm.conf: # COMPUTES TmpFS=/local/scratch # NODES GresTypes=disk,gpu ReturnToService=2 NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0

[slurm-dev] Re: file and directory permissions

2017-10-10 Thread Marcus Wagner
Arghh, On 10/10/2017 10:32 AM, Marcus Wagner wrote: Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the host or in a network directory. I'm

[slurm-dev] file and directory permissions

2017-10-10 Thread Marcus Wagner
Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the host or in a network directory. I'm a little confused about the description in the manpage

[slurm-dev] Re: Camacho Barranco, Roberto <rcamachobarra...@utep.edu> ssirimu...@utep.edu

2017-10-10 Thread Benjamin Redling
Hello everybody, On 10/10/17 8:25 AM, Marcus Wagner wrote: For a quick view, manually starting the controller slurmctld -D -vvv good advice, for beginners (or a tired help-seeker) a hint to "-f" might be necessary. Without the current configuration running the central management daemon is

[slurm-dev] Re: MPI-Jobs on cluster - how to set batchhost

2017-10-10 Thread Selch, Brigitte (FIDF)
Hello, sorry for answering late… Thank you to everybody! I will try to increase NFS performance. Best regards, Brigitte Selch Von: John Hearns [mailto:hear...@googlemail.com] Gesendet: Donnerstag, 28. September 2017 16:45 An: slurm-dev Betreff: [slurm-dev] Re:

[slurm-dev] Re: peculiar resources configuration in SLURM

2017-10-10 Thread cyberseawolf .
Dear Lachlan, thank you very much for your suggestions. I will try to experiment a little bit with those settings. Have you a nice day, Emanuele 2017-10-10 0:02 GMT+02:00 Lachlan Musicman : > On 9 October 2017 at 22:06, cyberseawolf . wrote: > >> Hello

[slurm-dev] Re: Camacho Barranco, Roberto <rcamachobarra...@utep.edu> ssirimu...@utep.edu

2017-10-10 Thread Marcus Wagner
For a quick view, manually starting the controller slurmctld -D -vvv might also help. Best Marcus On 10/10/2017 01:41 AM, Christopher Samuel wrote: On 10/10/17 07:21, Suman Sirimulla wrote: We have installed and configured slurm on our cluster, but unable to start the slurmctld daemon. We