Look for libmariadb-client. That's needed for slurmdbd on debian.
On Wed, Dec 11, 2019 at 11:43 AM Dean Schulze wrote:
>
> Turns out I've already got libmariadb-dev installed:
>
> $ dpkg -l | grep maria
> ii libmariadb-dev 3.0.3-1build1
>
Inline below
On Tue, Nov 26, 2019 at 5:50 AM Loris Bennett
wrote:
>
> Hi Nigella,
>
> Nigella Sanders writes:
>
> > Thank you all for such interesting replies.
> >
> > The --dependency option is quite useful but in practice it has some
> > inconvenients. Firstly, all 20 jobs are instantly queued
gres has to be specified in both slurm.conf and gres.conf and
gres.conf must be present on the node with the gres. I keep a single
cluster wide gres.conf and copy it to all nodes just like slurm.conf.
Also, after adding a new gres I think both the slurmctld and the
slurmd needs to be restarted.
On
Just FYI, I tried the shared state on NFS once, and it didn't work
well. Switched to native client glusterfs shared between the 2
controller nodes and haven't had a problem with it since.
On Tue, Jun 25, 2019 at 6:32 AM Buckley, Ronan wrote:
>
> Is there a way to diagnose if the I/O to the
> /cm
My first guess would be that the host is not listed as one of the two
controllers in the slurm.conf. Also, keep in mind munge, and thus
slurm is very sensitive to lack of clock synchronization between
nodes. FYI, I run a hand built slurm 18.08.07 on debian 8 & 9 without
issues. Haven't tried 10 yet
On Tue, Mar 19, 2019 at 8:34 AM Peter Steinbach wrote:
>
> Hi,
>
> we are struggling with a slurm 18.08.5 installation of ours. We are in a
> situation, where our GPU nodes have a considerable number of cores but
> "only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs
> can ente
On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui wrote:
>
> Hi,
>
> I am new to slurm and want to use weight option to schedule the jobs.
> I have some machine with same hardware configuration with GPU cards. I
> use QoS to force user at least required 1 gpu gres when submitting
> jobs.
> The ma
sacct. Though, of course, accounting has to be turned on and working.
On Fri, Feb 15, 2019 at 5:08 AM hu...@sugon.com wrote:
> Dear there,
> How to view the cpu usage of history jobs at each compute node?
> However, this command(control show jobs jobid --detail) can only get the
> cpu usage of t
Reminds me of a followup question I've been meaning to ask, is it just
the Slurmctld's that need access to the shared SlurmDBD, or do all the
slurmd's on all the nodes need access?
On Tue, Feb 12, 2019 at 7:16 AM Antony Cleave wrote:
>
> You will need to be able to connect both clusters to the sa
Looking through the slurm.conf docs and greping around the source code
it looks like MinJobAge might be what I need to adjust. I changed it
by 2 orders of magnitude, 300 -> 300_000 on our dev cluster. I'll see
how things go.
On Wed, Dec 19, 2018 at 1:14 PM Eli V wrote:
>
> Does sl
Does slurm remove job completion info from it's memory after a while?
Might explain a why I'm seeing job's getting cancled when there
dependent predecessor step finished ok. Below is the egrep
'352209(1|2)_11' from slurmctld.log. The 3522092 job array was created
with -d aftercorr:3522091. Looks li
SelectTypeParameters=CR_LLN will do this automatically for all jobs
submitted to the cluster. Not sure if that's an acceptable solution for you.
On Wed, Dec 12, 2018 at 11:54 AM Roger Moye wrote:
>
>
> I have a user who wants to control how job arrays are allocated to
> nodes. He wants to mimi
On Fri, Dec 7, 2018 at 7:53 AM Maik Schmidt wrote:
>
> I have found --hint=multithread, but this only works with task/affinity.
> We use task/cgroup. Are there any downsides to activating both task
> plugins at the same time?
>
> Best, Maik
>
> Am 07.12.18 um 13:33 schrieb Maik Schmidt:
> > Hi all
On Wed, Dec 5, 2018 at 5:04 PM Bjørn-Helge Mevik wrote:
>
> I don't think Slurm has any facility for soft memory limits.
>
> But you could emulate it by simply configure the nodes in slurm.conf
> with, e.g., 15% higher RealMemory value than what is actually available
> on the node. Then a node wi
On Thu, Dec 6, 2018 at 2:08 AM Loris Bennett wrote:
>
> Eli V writes:
>
> > We run our cluster using select parms CR_Core_Memory and always
> > require a user to set the memory used when submitting a job to avoid
> > swapping our nodes to uselessness. However, since sl
We run our cluster using select parms CR_Core_Memory and always
require a user to set the memory used when submitting a job to avoid
swapping our nodes to uselessness. However, since slurmd is pretty
vigilant about killing jobs that exceed their request we end up with
jobs requesting more memory th
In addition to these other suggestions, keep in mind the slurmd's will
talk to each other if you have more then 50 nodes(see TreeWidth in
slurm.conf), so this will require the nodes to be able to DNS lookup
and communicate to all the other nodes as well as the slurmctlds. I
tried adding in some nod
On Thu, Oct 18, 2018 at 1:03 PM Daniel Letai wrote:
>
>
> Hello all,
>
>
> To solve a requirement where a large number of job arrays (~10k arrays, each
> with at most 8M elements) with same priority should be executed with minimal
> starvation of any array - we don't want to wait for each array
Don't think you need CPUs in slurm.conf for the node def, just
Sockets=4 CoresPerSocket=4 ThreadsPerCore=1 for example, and the
slurmctld does the math for # cpus. Also slurmd -C on the nodes is
very useful to see what's being autodetected.
On Wed, Oct 10, 2018 at 11:34 AM Noam Bernstein
wrote:
>
Have you started the munge service? The order should be roughly, start
munge, start mysql/mariadb, start slurmdbd, start slurmctld, start
slurmd. You didn't mention which distribution you're using. On recent
debian versions the 3 slurm daemons have been split out independently
and you'll probably b
On Mon, Sep 24, 2018 at 12:27 PM Will Dennis wrote:
>
> Hi all,
>
> We want to add in some Gres resource types pertaining to GPUs (amount of GPU
> memory and CUDA cores) on some of our nodes. So we added the following params
> into the 'gres.conf' on the nodes that have GPUs:
>
> Name=gpu_mem Co
I'm seeing a weird issue(originally with 17.02 and still after
upgrading to 18.08) where occasionally job arrays created with -d
aftercorr seem to be getting mixed up in the slurm controller and the
wrong jobs are getting started and cancelled. Just created a bug for
it: https://bugs.schedmd.com/sh
Sound like you figured it out, but I mis-remembered and switched the
case on CR_LLN. Setting it spreads the jobs out across the nodes, not
filling one up first. Also, I believe it can be set per partition as
well.
On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer
wrote:
>
> Thanks for the input! I
I think you probably want CR_LLN set in your SelectTypeParameters in
slurm.conf. This makes it fill up a node before moving on to the next
instead of "striping" the jobs across the nodes.
On Mon, Sep 10, 2018 at 8:29 AM Felix Wolfheimer
wrote:
>
> No this happens without the "Oversubscribe" parame
Yes, I saw the same issue. Default for unset DefMemPerCPU changed from
unlimited in earlier versions to 0. I just set it to 384 in slurm.conf
so simple things run fine and make sure users always set a sane value
on submission.
On Mon, Jun 11, 2018 at 6:40 PM, Roberts, John E. wrote:
> I see this
25 matches
Mail list logo