[slurm-dev] Slurm Backfill Algorithm

2015-10-12 Thread vaibhav pol
Hi , I like to know more about the backfill algorithm of slurm. How bf_resolution and bf_window is used in it. I have been searching in mailing list archive and saw many of the people having issue of changing the start time of jobs or starvation of jobs which can solve by tuning of bf_w

[slurm-dev] Slurmd restart without loosing jobs?

2015-10-12 Thread Robbert Eggermont
Hello, Some modifications to the slurm.conf require me to restart the slurmd daemons on all nodes. Is there a way to do this without loosing any running jobs (and not having to drain the cluster)? Thanks, Robbert -- Robbert Eggermont Intelligent Systems r.e

[slurm-dev] Re: Slurmd restart without loosing jobs?

2015-10-12 Thread Paul Edmon
You should be able to do this with out losing any jobs (at least I've never lost any on any version of Slurm I have run). I do it all the time in our environment (about once a day) as our slurm.conf is in flux quite a bit. It should always preserve the running and pending state. The only i

[slurm-dev] Re: Slurmd restart without loosing jobs?

2015-10-12 Thread Antony Cleave
While this is true be very, very careful when restarting the slurmd on the controller node. it's quite easy to miss a typo in one of the config files, e.g. an unexpected comma in topology.conf which can cause slurm to segfault or otherwise shut-down uncleanly. If this happens then the state of

[slurm-dev] Re: Slurmd restart without loosing jobs?

2015-10-12 Thread Paul Edmon
I've had this happen several times, but have never lost jobs due to it. Still one should always watch the logs on the master when restarting so you can catch typos immediately. We run a sanity check on our conf's before we push them (we use puppet for configuration control). Our post commi

[slurm-dev] Re: Slurmd restart without loosing jobs?

2015-10-12 Thread Antony Cleave
I've only ever had this happen once but it's murphy's law that it didn't happen on the test system but on the system in production and I was just a minute or so too slow finding the error. Antony On 12 Oct 2015 18:25, "Paul Edmon" wrote: > > I've had this happen several times, but have never los

[slurm-dev] Problem while updating to new slurm version

2015-10-12 Thread Gasper . Kukec
Hello, Our GRID cluster has so far been using slurm version 14.11.4 on an el6 system and I wish to upgrade it to the newest version. The current cluster includes a master node (which is also a job execution node) and three other job execution node. I performed the upgrade using (with RPM built w

[slurm-dev] Re: Problem while updating to new slurm version

2015-10-12 Thread Barbara Krasovec
Can you post: sacctmgr show config Do you get more info if you run slurmdbd in debug mode? slurmdbd -D Cheers, Barbara On 10/12/2015 09:21 PM, gasper.ku...@ung.si wrote: Hello, Our GRID cluster has so far been using slurm version 14.11.4 on an el6 system and I wish to upgrade it to t