I agree. I brought it up with SchedMD after I spent almost an entire day
trying to figure out why jobs were queued up but not running. I figured
the reason column would say "reservation" if that was the issue.
Instead, it provided some completely useless message, making me think
the problem was elsewhere. When I confirmed it was reservation (with the
help of this list/you), I wanted to break something.
Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov
On 06/15/2018 01:26 PM, Ryan Novosielski wrote:
That’s great news — this is is a vFAQ at our site.
On Jun 13, 2018, at 1:37 PM, Prentice Bisbal <pbis...@pppl.gov> wrote:
Just to revisit this, for jobs that are queued, but prevented from running,
will have a more useful reason in 18.08, which will address one of my issues
with reservation collisions.
https://bugs.schedmd.com/show_bug.cgi?id=5138
https://bugs.schedmd.com/show_bug.cgi?id=4987
Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov
On 05/11/2018 10:36 AM, Douglas Jacobsen wrote:
A feature that many slurm users might like is sbatch --time-min. Using both
--time-min and --time a user can specify the range of acceptable wall times
limits. This can make it much easier to keep jobs running right up to the
maintenance reservation. e.g.:
sbatch --time-min=30:00 --time=48:00:00 script.sh
would allow the job to schedule for any time-slot between 30 minutes and 2 days
in length. If the user has some mechanism for job chaining or similar, this
can allow them to make the most of backfill opportunities.
-Doug
----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center
dmjacob...@lbl.gov
------------- __o
---------- _ '\<,_
----------(_)/ (_)__________________________
On Fri, May 11, 2018 at 7:27 AM Paul Edmon <ped...@cfa.harvard.edu> wrote:
In the past we used the LUA job submit plugin to block jobs that would
intersect maintenance reservations. I would look at that.
-Paul Edmon-
On 05/11/2018 08:19 AM, Bill Wichser wrote:
The problem is that reservations can be in there yet have no effect on
the submitted job if they would run before the reservation takes
place. One can pull the starting time simply using something like this
scontrol show res -o | awk '{print $2}'
with output
StartTime=2018-06-12T06:00:00
StartTime=2018-06-12T06:00:00
You'd need more code around that, obviously, to determine if this
starttime might hold up the job.
Bill
On 05/10/2018 04:23 PM, Prentice Bisbal wrote:
Dear Slurm Users,
We've started using maintenance reservations. As you would expect,
this caused some confusion for users who were wondering why their
jobs were queuing up and not running. Some of my users provide a
public service of sorts that automatically submits jobs to our
cluster. They would like to have their submission framework
automatically detect if there's a reservation that may interfere with
their jobs, and act accordingly.
What is the best way to do this? Typically, in my shell scripts, I
have some command that tests something, and then check exit code
returned by the command. For example to check if my name is in file
'foo.txt', I'd do something like this:
grep -iq prentice foo.txt
retval=$?
if [ $retval -eq 0 ]; then
echo "Prentice found"
else
echo "Prentice not found"
fi
unset retval
Or something like that. I was also thinking this might work, too:
num_res=$(scontrol -o show res | wc -l)
if [ $num_res -eq 0 ]; then
echo "No reservations found"
else
echo "$num_res reservation(s) found"
fi
Are there any better or other ways that you would recommend? Also, if
there's more than one, is are they listed in any kind of order in the
scontrol or sinfo output (soonest first, soonest last, etc.)? From
the man page, it looks like 'scontrol show reservation' doesn't
provide any sorting.
Prentice