You can get some information on that from sdiag, and there are tweaks you can 
make to backfill scheduling that affect how quickly it will get to a job.

That doesn’t really answer your real question, but might help you when you are 
looking into this.

Sent from my iPhone

On Sep 29, 2023, at 16:10, Groner, Rob <rug...@psu.edu> wrote:


I'm not looking for a one-time answer.  We run these tests anytime we change 
anything related to slurm....version, configuration, etc.    We certainly run 
the test after the system comes back up after an outage, and an hour would be a 
long time to wait for that.  That's certainly the brute-force approach, but I'm 
hoping there's a definitive way to show, through scontrol job output, that the 
job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 
1 hour, that is true, but there's a few issues with that.


  1.  I would then no longer be testing the system as it actually is.  I want 
to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does 
running for 5 minutes guarantee that the higher priority job had a chance to 
preempt it but didn't?  Or did the scheduler even ever get to it?  On a test 
cluster with few jobs, you could be reasonably assured it did, but running 
tests on the production cluster...isn't it possible the scheduler hasn't yet 
had a chance to process it, even after 5 minutes?  Depends on the slurm 
scheduler  settings I suppose....

rob

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of 
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 
<noam.bernst...@nrl.navy.mil>
Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernst...@nrl.navy.mil. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
On Sep 29, 2023, at 2:51 PM, Davide DelVento 
<davide.quan...@gmail.com<mailto:davide.quan...@gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it 
sounds like a good thing which I would test with a simulator, if I had one. 
I've been intrigued by (but really not looked much into) 
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob 
<rug...@psu.edu<mailto:rug...@psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job 
was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to 
this post? Also, you could set up the minimum time to a much smaller value, so 
it won't take as long to test.

Reply via email to