Re: [slurm-users] slurm.conf syntax checker?

2021-10-14 Thread Marcus Wagner
mostly, our problem was, that we forgot to add/remove a node to/from the partitions/topology file, which caused slurmctld to deny startup. So I wrote a simple checker for that. Here is the output of a sample run: reading '../conf/rcc/slurm.conf' ... reading '../conf/rcc/nodes.conf' ... reading

Re: [slurm-users] How to look for free nodes of a certain constraint efficiently

2021-10-14 Thread Ole Holm Nielsen
Hi Matt, How about this sinfo command: $ sinfo -O NodeList:30,Features:30,StateLong NODELIST AVAIL_FEATURESSTATE i023 xeon2650v2,infiniband,xeon16 draining@ i[004-022,024-050]xeon2650v2,infiniband,xeon16 allocated

Re: [slurm-users] How to look for free nodes of a certain constraint efficiently

2021-10-14 Thread Carsten Beyer
Hi Matt, you may have a look to sinfo/squeue command with the --format / -o output options, e.g.: [root@ma1 slurm]# sinfo -t idle -o "%P %.5a %.10l %.6D %.6t %N %b" PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST ACTIVE_FEATURES compute    up    8:00:00 44   idle

[slurm-users] How to look for free nodes of a certain constraint efficiently

2021-10-14 Thread Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
All, I work on a cluster that uses SLURM which has various types of nodes that are are controlled via --constraint flags in sbatch. Now, I started thinking "How can I figure out how many jobs are running/pending/etc on a certain type of node?". I first thought obviously "squeue

[slurm-users] missing hyperthreads on Xeon Phi in SNC4/Flat mode

2021-10-14 Thread Brice Goglin
Hello We have four Xeon Phi (KNL) nodes with 64 cores SMT-4 each (256 hyperthreads total). They are configured in different KNL modes (SNC4/flat, SNC4/cache, All2all/flat and all2all/cache). The node that is in SNC4/Flat won't let us allocate all 256 hyperthreads. Half the cores only get 2