Hi Chris,
Thank you for your reply regarding OpenMPI and srun. When I try to run an mpi
program using srun I find the following..
red[036-037]
[red036.cluster.local:308110] PMI_Init [pmix_s1.c:168:s1_init]: PMI is not
initialized
[red036.cluster.local:308107] PMI_Init
Hi Chris,
Thank you for your comments. Yesterday I experimented with increasing the
PriorityWeightJobSize and that does appear to have quite a profound effect on
the job mix executing at any one time. Larger jobs (needing 5 nodes or above)
are now getting a decent share of the nodes in the
Hello,
A colleague intimated that he thought that larger jobs were tending to get
starved out on our slurm cluster. It's not a busy time at the moment so it's
difficult to test this properly. Back in November it was not completely unusual
for a larger job to have to wait up to a week to start.
Hello,
Thank you for your comments on installing and using TurboVNC. I'm working on
the installation at the moment, and may get back with other questions relating
to the use of Slurm with VNC.
Best regards,
David
From: slurm-users on behalf of Daniel
Hello,
We have set up our NICE/DCV cluster and that is proving to be very popular.
There are, however, users who would benefit from using the resources offered by
our nodes with multiple GPU cards. This potentially means setting up TurboVNC,
for example. I would, if possible, like to be able
Hello,
I wondered if someone could please help us to understand why the
PrologFlags=contain flag is causing jobs to fail and draining compute nodes. We
are, by the way, using slurm 18.08.0. Has anyone else seem this behaviour?
I'm currently experimenting with PrologFlags=contain. I've found
ds,
David
From: slurm-users on behalf of Chris
Samuel
Sent: 20 November 2018 20:12:20
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Excessive use of backfill on a cluster
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote:
> We are running
Hi Lois
Thank you for sharing your multi priority configuration with us. I understand
why you say about the QOS factor -- I've reduced it and increased the FS factor
to see where that takes us. Our QOS factor is only there to ensure that test
jobs gain a higher priority more quickly than other
Hello,
Thank you for your reply and for the explanation. That makes sense -- your
explanation of backfill is as we expected. I think it's more that we are
surprised that almost all our jobs were being scheduled using backfill. We very
rarely see any being scheduled normally. It could be that
Hello,
We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm
appears to be using backfill scheduling excessively. In fact the vast majority
of jobs are being scheduled using backfill. So, for example, I have just
submitted a set of three serial jobs. They all started on a
Hello,
Thank you for your useful replies. It certainly not anywhere as difficult as I
initially thought. We should be able to start some tests later this week.
Best regards,
David
From: slurm-users on behalf of Roche
Ewan
Sent: 10 October 2018 08:07
To:
on behalf of Chris
Samuel
Sent: 26 September 2018 11:26
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08
On Tuesday, 25 September 2018 11:54:31 PM AEST Baker D. J. wrote:
> That will certainly work, however the slurmctld (or in th
From: slurm-users on behalf of Chris
Samuel
Sent: 25 September 2018 13:00
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08
On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J. wrote:
> I guess that the only so
Hello,
I wondered if I could compare notes with other community members who have
upgraded slurm on their cluster. We are currently running slurm v17.02 and I
understand that the rpm mix/structure changed at v17.11. We are, by the way,
planning to upgrade to v18.08.
I gather that I should
14 matches
Mail list logo