Re: [slurm-users] Larger jobs tend to get starved out on our cluster

2019-01-16 Thread Baker D . J .
Hi Chris, Thank you for your reply regarding OpenMPI and srun. When I try to run an mpi program using srun I find the following.. red[036-037] [red036.cluster.local:308110] PMI_Init [pmix_s1.c:168:s1_init]: PMI is not initialized [red036.cluster.local:308107] PMI_Init

Re: [slurm-users] Larger jobs tend to get starved out on our cluster

2019-01-11 Thread Baker D . J .
Hi Chris, Thank you for your comments. Yesterday I experimented with increasing the PriorityWeightJobSize and that does appear to have quite a profound effect on the job mix executing at any one time. Larger jobs (needing 5 nodes or above) are now getting a decent share of the nodes in the

[slurm-users] Larger jobs tend to get starved out on our cluster

2019-01-09 Thread Baker D . J .
Hello, A colleague intimated that he thought that larger jobs were tending to get starved out on our slurm cluster. It's not a busy time at the moment so it's difficult to test this properly. Back in November it was not completely unusual for a larger job to have to wait up to a week to start.

Re: [slurm-users] Visualisation -- Slurm and (Turbo)VNC

2019-01-04 Thread Baker D . J .
Hello, Thank you for your comments on installing and using TurboVNC. I'm working on the installation at the moment, and may get back with other questions relating to the use of Slurm with VNC. Best regards, David From: slurm-users on behalf of Daniel

[slurm-users] Visualisation -- Slurm and (Turbo)VNC

2019-01-03 Thread Baker D . J .
Hello, We have set up our NICE/DCV cluster and that is proving to be very popular. There are, however, users who would benefit from using the resources offered by our nodes with multiple GPU cards. This potentially means setting up TurboVNC, for example. I would, if possible, like to be able

[slurm-users] PrologFlags=Contain significantly changing job activity on compute nodes

2018-12-12 Thread Baker D . J .
Hello, I wondered if someone could please help us to understand why the PrologFlags=contain flag is causing jobs to fail and draining compute nodes. We are, by the way, using slurm 18.08.0. Has anyone else seem this behaviour? I'm currently experimenting with PrologFlags=contain. I've found

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-21 Thread Baker D . J .
ds, David From: slurm-users on behalf of Chris Samuel Sent: 20 November 2018 20:12:20 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Excessive use of backfill on a cluster On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote: > We are running

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-21 Thread Baker D . J .
Hi Lois Thank you for sharing your multi priority configuration with us. I understand why you say about the QOS factor -- I've reduced it and increased the FS factor to see where that takes us. Our QOS factor is only there to ensure that test jobs gain a higher priority more quickly than other

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Baker D . J .
Hello, Thank you for your reply and for the explanation. That makes sense -- your explanation of backfill is as we expected. I think it's more that we are surprised that almost all our jobs were being scheduled using backfill. We very rarely see any being scheduled normally. It could be that

[slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Baker D . J .
Hello, We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm appears to be using backfill scheduling excessively. In fact the vast majority of jobs are being scheduled using backfill. So, for example, I have just submitted a set of three serial jobs. They all started on a

Re: [slurm-users] Help with developing a lua job submit script

2018-10-10 Thread Baker D . J .
Hello, Thank you for your useful replies. It certainly not anywhere as difficult as I initially thought. We should be able to start some tests later this week. Best regards, David From: slurm-users on behalf of Roche Ewan Sent: 10 October 2018 08:07 To:

Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-26 Thread Baker D . J .
on behalf of Chris Samuel Sent: 26 September 2018 11:26 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08 On Tuesday, 25 September 2018 11:54:31 PM AEST Baker D. J. wrote: > That will certainly work, however the slurmctld (or in th

Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-25 Thread Baker D . J .
From: slurm-users on behalf of Chris Samuel Sent: 25 September 2018 13:00 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08 On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J. wrote: > I guess that the only so

[slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-25 Thread Baker D . J .
Hello, I wondered if I could compare notes with other community members who have upgraded slurm on their cluster. We are currently running slurm v17.02 and I understand that the rpm mix/structure changed at v17.11. We are, by the way, planning to upgrade to v18.08. I gather that I should