Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan
>Check the Gaussian log file for mention of its using just 8 CPUs-- just because there are 12 CPUs available doesn't mean the program uses all of >them. It will scale-back if 12 isn't a good match to the problem as I recall. Well, in the log file, it says *

Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-11 Thread Taras Shapovalov
Thank you, guys, Lets wait for 17.11.8. Any estimation for the release date? Best regards, Taras On Wed, Jul 11, 2018 at 12:11 AM Kilian Cavalotti < kilian.cavalotti.w...@gmail.com> wrote: > On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov > wrote: > > I noticed the commit that can be relat

Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-11 Thread Douglas Jacobsen
Applying patches d52d8f4f0 and f07f53fc13 to a slurm 17.11.7 source tree fixes this issue in my experience. Only requires restarting slurmctld. Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer Acting Group Lead, Computational Systems Group National Energy Research Scientific Computing C

Re: [slurm-users] cpu limit issue

2018-07-11 Thread John Hearns
Mahmood, I am sure you have checked this. Try runningps -eaf --forest while a job is running. I often find the --forest option helps to understand how batch jobs are being run. On 11 July 2018 at 09:12, Mahmood Naderan wrote: > >Check the Gaussian log file for mention of its using just

Re: [slurm-users] cpu limit issue

2018-07-11 Thread John Hearns
Another thought - are we getting mixed up between hyperthreaded and physical cores here? I don't see how 12 hyperthreaded cores translates to 8 though - it would be 6! On 11 July 2018 at 10:30, John Hearns wrote: > Mahmood, > I am sure you have checked this. Try runningps -eaf --fores

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan
>Try runningps -eaf --forest while a job is running. noor 30907 30903 0 Jul10 ?00:00:00 \_ /bin/bash /var/spool/slurmd/job00749/slurm_script noor 30908 30907 0 Jul10 ?00:00:00 \_ g09 trimmer.gjf noor 30909 30908 99 Jul10 ?4-13:00:21 \_ /usr/local/chem

Re: [slurm-users] cpu limit issue

2018-07-11 Thread John Hearns
Mahmood, please please forgive me for saying this. A quick Google shows that Opteron 61xx have eight or twelve cores. Have you checked that all the servers have 12 cores? I realise I am appearing stupid here. On 11 July 2018 at 10:39, Mahmood Naderan wrote: > >Try runningps -eaf --fores

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan
My fault. One of the other nodes was in my mind! The node which is running g09 is [root@compute-0-3 ~]# ps aux | grep l502 root 11198 0.0 0.0 112664 968 pts/0S+ 13:31 0:00 grep --color=auto l502 nooriza+ 30909 803 1.4 21095004 947968 ? Rl Jul10 6752:47 /usr/local/chem/g

Re: [slurm-users] cpu limit issue

2018-07-11 Thread John Hearns
I bet all on here would just LOVE the AMD Fangio ;-) http://www.cpu-world.com/news_2012/2012111801_Obscure_CPUs_AMD_Opteron_6275.html Hint - quite a few of these were sold! On 11 July 2018 at 11:04, Mahmood Naderan wrote: > My fault. One of the other nodes was in my mind! > > The node which is

[slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
Hi all, is it expected/intended that the env variable SLURM_NTASKS is not defined after salloc? It only gets defined after the an srun command. The number of tasks appear in `scontrol -d show job ` though. So is it a bug in our installation or expected? Thanks, Alex

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Peter Kjellström
On Wed, 11 Jul 2018 14:10:51 +0200 Alexander Grund wrote: > Hi all, > > is it expected/intended that the env variable SLURM_NTASKS is not > defined after salloc? It only gets defined after the an srun command. > The number of tasks appear in `scontrol -d show job ` though. > So is it a bug in o

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
Hi Peter, thanks for the information, you are right: SLURM_NTASKS is not set if "-n" is not passed to salloc. I am kinda relying on what happens after I call "srun ./binary" especially how many instances will be started. scontrol shows this information, so I could parse this. But is there an

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Jeffrey Frey
SLURM_NTASKS is only unset when no task count flags are handed to salloc (no --ntasks, --ntasks-per-node, etc.). Can't you then assume if it's not present in the environment you've got a single task allocated to you? So in your generic starter script instead of using SLURM_NTASKS itself, use a

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Renfro, Michael
Looking at your script, there’s a chance that by only specifying ntasks instead of ntasks-per-node or a similar parameter, you might have allocated 8 CPUs on one node, and the remaining 4 on another. Regardless, I’ve dug into my Gaussian documentation, and here’s my test case for you to see wha

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
Unfortunately this will not work. Example: salloc --nodes=3 --exclusive I'm wondering, why there is a discrepancy between the environment variables and scontrol. The latter clearly shows "NumNodes=3 NumCPUs=72 NumTasks=3 CPUs/Task=1" (yes I realize that those values are inconsistent too, but a