[web2py] Re: scheduler new feature: task dependencies

Limedrop Tue, 05 Aug 2014 13:31:14 -0700

Thanks Niphlop, that's brilliant!  I've been using a work-around to 
schedule dependent jobs, and this will help to tidy things up.


On Tuesday, August 5, 2014 8:51:25 PM UTC+12, Niphlod wrote:
>
> Hi @all,
>    we have another feature in trunk for the scheduler... Jobs (i.e. task 
> dependencies)
>
> Directly from https://github.com/niphlod/w2p_scheduler_tests/ (that has 
> been updated to accomodate the new feature explanation...)
>
>
> What are "Jobs", you ask ? Well, it's a way to coordinate a set of tasks 
> that have dependencies (what in Celery is called "Canvas").
>
> As always, the Scheduler sticks to the basics. Every Job is considered to 
> be a DAG (a  Directed Acyclic Graph 
> <http://en.wikipedia.org/wiki/Directed_acyclic_graph>).
> Without going into silly details, every task can have one or more 
> dependencies, but of course you can't have mutual dependencies among the 
> same tasks.
> If a "job" can't be represented as a DAG, then it can't be processed in 
> its entirety. You can still queue it, but it won't ever complete (i.e. you 
> could have a
>   complete stall at the first task or just a task left on 100 queued...)
>
> So... what can you do ?
> Let's take a trivial example (there are a few based on mathematics, 
> map/reduce, etc... but hey, this is an example!!!)
> Suppose you need to create a job that describes what is needed to get 
> dressed ( thanks to http://hansolav.net/sql/graphs.html )...
>
> We have a few items to wear, and there's an "order" to respect...
> Items are: watch, jacket, shirt, tie, pants, undershorts, belt, shoes, 
> socks
>
> Now, we can't put on the tie without wearing the shirt first, etc...
>
> <http://yuml.me/995413d6>
>
>
>
> Suppose we have those tasks queued in a controller (for example's sake, 
> the same function, with different task_name(s))...
> watch = s.queue_task(fname, task_name='watch')
> jacket = s.queue_task(fname, task_name='jacket')
> shirt = s.queue_task(fname, task_name='shirt')
> tie = s.queue_task(fname, task_name='tie')
> pants = s.queue_task(fname, task_name='pants')
> undershorts = s.queue_task(fname, task_name='undershorts')
> belt = s.queue_task(fname, task_name='belt')
> shoes = s.queue_task(fname, task_name='shoes')
> socks = s.queue_task(fname, task_name='socks')
>
>
> Now, there's a helper class to construct and validate a "job".
> First, let's declare a job named "job_1"
>
>
> #from gluon.scheduler import JobGraph
> myjob = JobGraph(db, 'job_1')
>
>
>
> Next, we'd need to establish dependencies
>
>
> # before the tie, comes the shirt
> myjob.add_deps(tie.id, shirt.id)
> # before the belt too comes the shirt
> myjob.add_deps(belt.id, shirt.id)
> # before the jacket, comes the tie
> myjob.add_deps(jacket.id, tie.id)
> # before the belt, come the pants
> myjob.add_deps(belt.id, pants.id)
> # before the shoes, comes the pants
> myjob.add_deps(shoes.id, pants.id)
> # before the pants, comes the undershorts
> myjob.add_deps(pants.id, undershorts.id)
> # before the shoes, comes the undershorts
> myjob.add_deps(shoes.id, undershorts.id)
> # before the jacket, comes the belt
> myjob.add_deps(jacket.id, belt.id)
> # before the shoes, comes the socks
> myjob.add_deps(shoes.id, socks.id)
>
>
>
> Then, we can ask JobGraph if what we asked is a job that is accomplishable
>
> myjob.validate('job_1')
>
> And voilà, job done! If it's not a DAG, then an exception will be raised 
> and the jobs won't be committed (of course their dependencies won't be 
> committed too)
>
> How it works under the hood ?
>
> There's a new table called scheduler_task_deps that holds a reference to 
> the job_name, the task parent, the task child and
> a boolean to mark the "path" (the arrows in the graph) as "visitable".
> To be fair, the job name isn't that important, you can have task 
> dependencies amongst
> different jobs, it's just not that easy to verify that the Job is a DAG at 
> a later stage.
> If a path is "visitable" it means that the DAG graph can be "walked" in 
> that direction.
> Every time a task gets "COMPLETED", the "paths" gets updated to be 
> "visitable". The algo to pick up tasks has been updated
> to work fetching only tasks that have no dependencies, or dependencies 
> that have already been satisfied (i.e. tasks that depends
> on nothing, or tasks that depend on tasks that are yet COMPLETED).
>
>
> Let me know what you think, and if you spot bugs.
>
>
>
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: scheduler new feature: task dependencies

Reply via email to