This solves a performance problem with the existing planner. The problem is that with a large installation, and a big queue, a full plan can take a long time to prepare. (In our current installation, perhaps as long as half an hour.) Any resource which becomes free during one plan run cannot be allocated to a new job until the next plan run starts. This means resources (test machines) are often sitting around idle.
Fix this by restarting the planning process as soon as any new resource becomes free. This means that jobs at the front of the queue get a chance to allocate it right away, so it will probably be allocated soon. If it is only interesting to jobs later in the queue, then there may be a delay in reallocating it, but presumably the resource is not much in demand and those later jobs will allocate it when they get a bit closer to the head. But, there is a problem with this: it means that the plan is generally never completed. So we have no overview any more of when which flights will finish and what the overall queue is like. We solve this problem by running a second instance of the planner algorithm, all the way to completion, in a `dummy' mode where no actual resource allocation takes place. This second `projection' instance comes into being whenever the main `plan' instance is restarted, and it inherits the planning state from the main `plan' instance. Global livelock (where we keep restarting the plan but never manage to allocate anything) is not possible because each restart involves a new resource becoming free. If nothing gets allocated because we can't get that far before being restarted, then eventually there will be nothing left allocated to become newly free. Starvation, of a form, is possible: a late-in-queue job which wants a resource available right now might have difficulty allocating it because the planner is spending its effort rescheduling early-in-queue jobs which want resources which are in greater demand - so that the late-in-queue job never gets called. Arguably this is an appropriate allocation of planning time. With this arrangement we can generate two reports: a `plan' report containing the short term plan which was used for actual resource allocation, and which is frequently restarted and therefore not necessarily complete; and a `projection' report which contains a complete plan for all work the system is currently aware of, but which is less-frequently updated. Because planner clients do not contain the planning algorithm state, the only client change needed is the ability to run in a `dummy' mode without actual allocation; this is the `noalloc' feature earlier in this series. The main work is in ms-queuedaemon. We have prepared the ground for multiple instances of the planning algorithm; from the point of view of ms-queuedaemon, an instance of the planning algorithm is mainly a walk over the job queue. So we call them `walkers'. Therefore, what we do here is introduce a new `projection' walker, as follows: Add `projection' to the global list of possible walkers. Invent a new section of code, the `restarter', which is responsible for managing the relationship between the two walkers. (It uses direct knowledge of the queue state data structures, etc., to avoid having to invent a complete formal interface to a walker.) If we ever finish the plan walker's queue, we update both the projection report output and the plan report output, from the same plan. Finishing the projection walker's queue means we have a complete projection, but we don't touch the plan. In principle it might happen that the plan walker might overtake the projection walker, and then complete, write out a complete and up to date plan as the projection, and that the projection walker would then complete and overwrite the projection with less-up-to-date information. We don't explicitly exclude this. Of course such a result will be rectified soon enough by another planning run. The restarter can ask the database for the list of currently-available resources, and can therefore detect when new become newly-free. The rest of the code remains largely ignorant of the operation of the restarter. There are a few hooks: runneeded-perhaps-start notifies the restarter when we start the plan; this is used by the restarter to record the set of free resources at the start of a planning run, so that it can see later whether any /new/ resources have become free. restarter-maybe-provoke-restart is called when we get notification from the the owner daemon that resources may have become idle. We look for newly-idle resources, and if there are any, and we are running the plan walker, we directly edit the plan walker's queue to put RESTART at the front. queuerun-perhaps-step spots the special entry RESTART in its queue and calls into back the restarter when it finds it. This deferred approach is necessary because we can't do the restart operation while a client is thinking (because we would have to change that client's cogitation from the `live, can allocate' mode to the `dummy, cannot allocate' mode; and because that would make the code more complex). The main work is done in the restarter-restart-now hook. It reports the current (incomplete) plan, and then checks to see if a projection walker is running; if it is, it leaves it alone, and simply abandons the current plan run and arranges for a new run to started. If a projection walker is not running it copies all the plan walker's state (including the data-plan.pl disk file containing the plan-in-progress) to the projection walker, and sets the projection walker going. We update .gitignore to ignore data-plan.* and data-projection.*. Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com> --- v2: Update .gitignore too. Use `walker-globals' not `walker-runvars' (which does not exist). Remove wrap damage `#' from comment. Fix typo in commit message. Fix several silly bugs in for-free-resources Fix three silly bugs relating to handling of $newly_free Fix a wrong bracket syntax error in restarter-maybe-provoke-restart Properly return from queuerun-perhaps-step on RESTART; restarter-restart-now has taken the flow of control. Reorder operations in restarter-restart-now so as to make it work Correct some wrong log messages in restarter-restart-now Add a log message when we restart planning Minor code layout changes In notify-to-think, process feature-noalloc properly --- .gitignore | 4 +-- README.planner | 8 +++++ ms-queuedaemon | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 112 insertions(+), 4 deletions(-) diff --git a/.gitignore b/.gitignore index 07b039d..bccf488 100644 --- a/.gitignore +++ b/.gitignore @@ -18,8 +18,8 @@ publish-lock bisecting-sticky-branch [tu].* [tu] -data-plan.pl -data-plan.pl.new +data-plan.* +data-projection.* data-plan-debug-*.txt data-tree-lock tree-bisect diff --git a/README.planner b/README.planner index 52f757b..b1bacd4 100644 --- a/README.planner +++ b/README.planner @@ -76,6 +76,14 @@ that newly-freed resources are properly offered first to the tasks at the front of the queue. ms-ownerdaemon sets all idle resources to allocatable at the start of each planning cycle. +The planner actually sometimes runs two planning cycles: if resources +become free while the planner is running, it will restart the planning +cycle in an effort to get those resources into service. But, it will +leave the existing planning run going in a projection-only mode (where +no resources actually get allocated), so that there is a report for +the administrator showing an idea of what the system thinks may happen +in the more distant future. + ms-ownerdaemon and `ownd' tasks ------------------------------- diff --git a/ms-queuedaemon b/ms-queuedaemon index 811f0ee..d48715e 100755 --- a/ms-queuedaemon +++ b/ms-queuedaemon @@ -21,7 +21,7 @@ source ./tcl/daemonlib.tcl -set walkers {plan} +set walkers {plan projection} proc walker-globals {w} { # introduces queue_running, thinking[_after] for the specific walker @@ -169,12 +169,19 @@ proc runneeded-perhaps-start {} { log "runneeded-perhaps-start starting cleaned=$cleaned" runneeded-2-requeue + restarter-starting-plan-hook queuerun-start plan } proc queuerun-finished/plan {} { runneeded-ensure-will 0 report-plan plan plan + report-plan plan projection +} + +proc queuerun-finished/projection {} { + runneeded-ensure-will 0 + report-plan projection projection } proc runneeded-ensure-polling {} { @@ -255,6 +262,12 @@ proc queuerun-perhaps-step {w} { } set next [lindex $queue_running 0] + if {![string compare RESTART $next]} { + lshift queue_running + restarter-restart-now + return + } + set already [we-are-thinking $next] if {[llength $already]} { # $already will wake us via walkers-perhaps-queue-steps @@ -378,9 +391,95 @@ proc cmd/unwait {chan desc} { puts-chan $chan "OK unwait $res" } +#---------- special magic for restarting the plan ---------- + +proc for-free-resources {varname body} { + jobdb::transaction resources { + pg_execute -array free_resources_row dbh { + SELECT (restype || '/' || resname || '/' || shareix) AS r + FROM resources + WHERE NOT (SELECT live FROM tasks WHERE taskid=owntaskid) + ORDER BY restype, resname + } { + uplevel 1 [list set $varname $free_resources_row(r)] + uplevel 1 $body + } + } +} + +proc restarter-starting-plan-hook {} { + global wasfree + catch { unset wasfree } + for-free-resources freeres { + set wasfree($freeres) 1 + } +} + +proc restarter-maybe-provoke-restart {} { + set newly_free {} + global wasfree + for-free-resources freeres { + if {[info exists wasfree($freeres)]} continue + lappend newly_free $freeres + set wasfree($freeres) 1 + } + if {![llength $newly_free]} { + log-event "restarter-maybe-provoke-restart nothing" + return + } + + walker-globals plan + + if {!([info exists queue_running] && [llength $queue_running])} { + log-event "restarter-maybe-provoke-restart not-running ($newly_free)" + return + } + + log-event "restarter-maybe-provoke-restart provoked ($newly_free)" + + if {[string compare RESTART [lindex $queue_running 0]]} { + set queue_running [concat RESTART $queue_running] + } + after idle queuerun-perhaps-step plan +} + +proc restarter-restart-now {} { + # We restart the `plan' walker. Well, actually, if the + # `projection' walker is not running, we transfer the `plan' + # walker to it. At this stage the plan walker is not thinking so + # there are no outstanding callbacks to worry about. + + log-event restarter-restart-now + + global projection/queue_running + global plan/queue_running + + if {![info exists projection/queue_running]} { + log-event "restarter-restart-now projection-idle continue-as" + set projection/queue_running [set plan/queue_running] + file copy -force data-plan.pl data-projection.pl + after idle queuerun-perhaps-step projection + } else { + log-event "restarter-restart-now projection-running" + } + + report-plan plan plan + + unset plan/queue_running + runneeded-ensure-will 2 +} + proc notify-to-think {w thinking} { for-chan $thinking { - puts-chan $thinking "!OK think" + set noalloc [chan-get-info $thinking {$info(feature-noalloc)} {}] + switch -glob $w.$noalloc { + plan.* { puts-chan $thinking "!OK think" } + projection.1 { puts-chan $thinking "!OK think noalloc" } + projection.* { + # oh well, can't include it in the projection; too bad + queuerun-step-done $w "!feature-noalloc" + } + } } } @@ -519,6 +618,7 @@ proc await-endings-notified {} { error "$owndchan eof" } runneeded-ensure-will 2 + restarter-maybe-provoke-restart } } -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel