Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-04-04 Thread Reuti
> > The user added > > #SBATCH --export=none > > to his submission script to prevent any environment variables in the GUI's > environment from being applied to his job. After making that change, his job > worked as expected, so this confirmed it was an environment issue. We > compared the dif

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
That’s correct but let’s keep in mind that it only concerns the upgrade process and not production runtime which has certain implications. The affected database structures have been introduced in 17.11 and an upgrade affects only versions 17.02 or prior, it wouldn’t be a problem for users who ha

[slurm-users] slurmdbd purge not working

2019-04-04 Thread Julien Rey
Hello, Our slurm accounting database is growing bigger and bigger (more than 100Gb) and is never being purged. We are running slurm 15.08.0-0pre1. I would like to upgrade to a more recent version of the slurmdbd, but my fear is that it may break everything during the update of the database.

Re: [slurm-users] slurmdbd purge not working

2019-04-04 Thread Paul Edmon
We ran into this problem in the past.  I know that fixes were put in to deal with large purges as a result of our problems but I don't recall what version they ended up in, likely newer than 15.08.0. A solution that can work is to walk up the time so that instead of one large purge you do seve

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Chris Samuel
On 4/4/19 4:07 am, Lech Nieroda wrote: Furthermore, upgrades shouldn’t skip more than one release, as that would lead to loss of state files and other important information, so users probably won’t upgrade from 17.02 to 19.05 directly. If they’d do that then yes, the patch would be applicable

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
> Upgrading more than 2 releases isn't supported, so I don't believe the 19.05 > slurmdbd will have the code in it to upgrade tables from earlier than 17.11. I haven’t found any mention of this in the upgrade section of the QuickStart guide (see https://slurm.schedmd.com/quickstart_admin.html#up

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Prentice Bisbal
Lech, Thanks for the explanation. Now that you explained it like that, I understand SchedMD's decision. I was misreading the situation. I was under the impression that this affected *all* db upgrades, not just those from one old version a slightly less older version. Prentice On 4/4/19 7:07

[slurm-users] Slurm 1 CPU

2019-04-04 Thread Chris Bateson
I should start out by saying that I am extremely new to anything HPC. Our end users purchased a 20 node cluster which a vendor set up for us with Bright/Slurm. After our vendor said everything was complete and we started migrating our users workflow to the new cluster they discovered that they ca

Re: [slurm-users] Slurm 1 CPU

2019-04-04 Thread Colas Rivière
Hello, Did you try adding something like that to slurm.conf? NodeName=cnode001 CPUs=48 Cheers, Colas On 2019-04-04 17:18, Chris Bateson wrote: I should start out by saying that I am extremely new to anything HPC.  Our end users purchased a 20 node cluster which a vendor set up for us with

Re: [slurm-users] Slurm 1 CPU

2019-04-04 Thread Andy Riebs
in slurm.conf, on the line(s) starting "NodeName=", you'll want to add specs for sockets, cores, and threads/core. *From:* Chris Bateson *Sent:* Thursday, April 04, 2019 5:18PM *To:* Slurm-users *Cc:* *Subject:* [slurm-us

Re: [slurm-users] Slurm 1 CPU

2019-04-04 Thread Alex Chekholko
Hi Chris, re: "can't run more than 1 job per node at a time. " try "scontrol show config" and grep for defmem IIRC by default the memory request for any job is all the memory in a node. Regards, Alex On Thu, Apr 4, 2019 at 4:01 PM Andy Riebs wrote: > in slurm.conf, on the line(s) starting "

Re: [slurm-users] Slurm 1 CPU

2019-04-04 Thread Sharma, M D
Also.. just to add. If you are using Bright, the easiest way to fix your core count issue is to use the cmsh command, go to the node definition; roles; slurmclient and then edit the appropriate fields (details are in the manual). This will populate the slurm.conf with the corrected node definit