I agree. I brought it up with SchedMD after I spent almost an entire day trying to figure out why jobs were queued up but not running. I figured the reason column would say "reservation" if that was the issue. Instead, it provided some completely useless message, making me think the problem was elsewhere. When I confirmed it was reservation (with the help of this list/you), I wanted to break something.

Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov

On 06/15/2018 01:26 PM, Ryan Novosielski wrote:
That’s great news — this is is a vFAQ at our site.

On Jun 13, 2018, at 1:37 PM, Prentice Bisbal <pbis...@pppl.gov> wrote:

Just to revisit this, for jobs that are queued, but prevented from running, 
will have a more useful reason in 18.08, which will address one of my issues 
with reservation collisions.
https://bugs.schedmd.com/show_bug.cgi?id=5138
https://bugs.schedmd.com/show_bug.cgi?id=4987

Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory

http://www.pppl.gov
On 05/11/2018 10:36 AM, Douglas Jacobsen wrote:
A feature that many slurm users might like is sbatch --time-min.  Using both 
--time-min and --time a user can specify the range of acceptable wall times 
limits.  This can make it much easier to keep jobs running right  up to the 
maintenance reservation.  e.g.:

sbatch --time-min=30:00 --time=48:00:00 script.sh

would allow the job to schedule for any time-slot between 30 minutes and 2 days 
in length.  If the user has some mechanism for job chaining or similar, this 
can allow them to make the most of backfill opportunities.

-Doug

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center
dmjacob...@lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________



On Fri, May 11, 2018 at 7:27 AM Paul Edmon <ped...@cfa.harvard.edu> wrote:
In the past we used the LUA job submit plugin to block jobs that would
intersect maintenance reservations.  I would look at that.

-Paul Edmon-


On 05/11/2018 08:19 AM, Bill Wichser wrote:
The problem is that reservations can be in there yet have no effect on
the submitted job if they would run before the reservation takes
place. One can pull the starting time simply using something like this

scontrol show res -o | awk '{print $2}'

with output

StartTime=2018-06-12T06:00:00
StartTime=2018-06-12T06:00:00

You'd need more code around that, obviously, to determine if this
starttime might hold up the job.

Bill


On 05/10/2018 04:23 PM, Prentice Bisbal wrote:
Dear Slurm Users,

We've started using maintenance reservations. As you would expect,
this caused some confusion for users who were wondering why their
jobs were queuing up and not running. Some of my users provide a
public service of sorts that automatically submits jobs to our
cluster. They would like to have their submission framework
automatically detect if there's a reservation that may interfere with
their jobs, and act accordingly.

What is the best way to do this? Typically, in my shell scripts, I
have some command that tests something, and then check exit code
returned by the command. For example to check if my name is in file
'foo.txt', I'd do something like this:

grep -iq prentice foo.txt
retval=$?
if [ $retval -eq 0 ]; then
      echo "Prentice found"
else
      echo "Prentice not found"
fi
unset retval

Or something like that. I was also thinking this might work, too:

num_res=$(scontrol -o show res  | wc -l)
if [ $num_res -eq 0 ]; then
      echo "No reservations found"
else
      echo "$num_res reservation(s) found"
fi

Are there any better or other ways that you would recommend? Also, if
there's more than one, is are they listed in any kind of order in the
scontrol or sinfo output (soonest first, soonest last, etc.)? From
the man page, it looks like 'scontrol show reservation' doesn't
provide any sorting.

Prentice








Reply via email to