[web2py] scheduler new feature: task dependencies

Niphlod Tue, 05 Aug 2014 01:51:53 -0700

Hi @all,
   we have another feature in trunk for the scheduler... Jobs (i.e. task 
dependencies)


Directly from https://github.com/niphlod/w2p_scheduler_tests/ (that has 
been updated to accomodate the new feature explanation...)


What are "Jobs", you ask ? Well, it's a way to coordinate a set of tasks 
that have dependencies (what in Celery is called "Canvas").

As always, the Scheduler sticks to the basics. Every Job is considered to 
be a DAG (a  Directed Acyclic Graph 
<http://en.wikipedia.org/wiki/Directed_acyclic_graph>).
Without going into silly details, every task can have one or more 
dependencies, but of course you can't have mutual dependencies among the 
same tasks.
If a "job" can't be represented as a DAG, then it can't be processed in its 
entirety. You can still queue it, but it won't ever complete (i.e. you 
could have a
  complete stall at the first task or just a task left on 100 queued...)

So... what can you do ?
Let's take a trivial example (there are a few based on mathematics, 
map/reduce, etc... but hey, this is an example!!!)
Suppose you need to create a job that describes what is needed to get 
dressed ( thanks to http://hansolav.net/sql/graphs.html )...

We have a few items to wear, and there's an "order" to respect...
Items are: watch, jacket, shirt, tie, pants, undershorts, belt, shoes, socks

Now, we can't put on the tie without wearing the shirt first, etc...

<http://yuml.me/995413d6>



Suppose we have those tasks queued in a controller (for example's sake, the 
same function, with different task_name(s))...
watch = s.queue_task(fname, task_name='watch')
jacket = s.queue_task(fname, task_name='jacket')
shirt = s.queue_task(fname, task_name='shirt')
tie = s.queue_task(fname, task_name='tie')
pants = s.queue_task(fname, task_name='pants')
undershorts = s.queue_task(fname, task_name='undershorts')
belt = s.queue_task(fname, task_name='belt')
shoes = s.queue_task(fname, task_name='shoes')
socks = s.queue_task(fname, task_name='socks')


Now, there's a helper class to construct and validate a "job".
First, let's declare a job named "job_1"


#from gluon.scheduler import JobGraph
myjob = JobGraph(db, 'job_1')



Next, we'd need to establish dependencies


# before the tie, comes the shirt
myjob.add_deps(tie.id, shirt.id)
# before the belt too comes the shirt
myjob.add_deps(belt.id, shirt.id)
# before the jacket, comes the tie
myjob.add_deps(jacket.id, tie.id)
# before the belt, come the pants
myjob.add_deps(belt.id, pants.id)
# before the shoes, comes the pants
myjob.add_deps(shoes.id, pants.id)
# before the pants, comes the undershorts
myjob.add_deps(pants.id, undershorts.id)
# before the shoes, comes the undershorts
myjob.add_deps(shoes.id, undershorts.id)
# before the jacket, comes the belt
myjob.add_deps(jacket.id, belt.id)
# before the shoes, comes the socks
myjob.add_deps(shoes.id, socks.id)



Then, we can ask JobGraph if what we asked is a job that is accomplishable

myjob.validate('job_1')

And voilà, job done! If it's not a DAG, then an exception will be raised 
and the jobs won't be committed (of course their dependencies won't be 
committed too)

How it works under the hood ?

There's a new table called scheduler_task_deps that holds a reference to 
the job_name, the task parent, the task child and
a boolean to mark the "path" (the arrows in the graph) as "visitable".
To be fair, the job name isn't that important, you can have task 
dependencies amongst
different jobs, it's just not that easy to verify that the Job is a DAG at 
a later stage.
If a path is "visitable" it means that the DAG graph can be "walked" in 
that direction.
Every time a task gets "COMPLETED", the "paths" gets updated to be 
"visitable". The algo to pick up tasks has been updated
to work fetching only tasks that have no dependencies, or dependencies that 
have already been satisfied (i.e. tasks that depends
on nothing, or tasks that depend on tasks that are yet COMPLETED).


Let me know what you think, and if you spot bugs.



-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] scheduler new feature: task dependencies

Reply via email to