[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-25 Thread Massimo Di Pierro
At 5am. This is the formula:

next_run_time = task.last_run_time + timedelta(seconds=task.period)



On Aug 25, 12:29 pm, Martín Mulone  wrote:
> I have a doubt, suppose I have a task I have to run every day, the task take
> one hour to complete and suppose I scheduled to start at 5am every day. The
> first time it's run at 5am, the next day it's run at 5am or at 6am? because
> I put period=86400sec and the prior task take 1 hour to complete.
>
> 2011/8/8 Massimo Di Pierro 
>
>
>
>
>
>
>
>
>
> > ## preambole
>
> > I have been working on porting django-celery to web2py-celery.
> >http://code.google.com/p/web2py-celery
> > There are a few issues to resolve and I am working on it.
>
> > Yet I found it to be overkill for most users. It has lots of
> > dependencies (for example RabbitMQ) and it is not easy to manage. If
> > you do not need a huge number of worker nodes there may be a better
> > solution.
>
> > So  I added this to trunk:
>
> > gluon/scheduler.py
>
> > This email is a request for comments as I think this should replace te
> > current cron mechanism.
>
> > ## What is it?
> > It is a lightweight replacement for celery that uses the database
> > instead of queues to schedule tasks and uses the default web2py admin
> > interface to allow you to schedule tasks. It consists of a single file
> > and has no dependencies.
>
> > ## How does it work?
>
> > For any existing
> > app
>
> > Create File: app/models/scheduler.py
> > ==
> > from gluon.scheduler import
> > Scheduler
>
> > def
> > demo1(*args,**vars):
> >    print 'you passed args=%s and vars=%s' % (args,
> > vars)
> >    return
> > 'done!'
>
> > def
> > demo2():
>
> > 1/0
>
> > scheduler =
> > Scheduler(db,dict(demo1=demo1,demo2=demo2))
> > =
>
> > Create File: app/modules/scheduler.py
> > ==
> > scheduler.worker_loop()
> > =
>
> > ## run worker nodes
> > with:
> > python web2py.py -S app -M -N -R applications/app/modules/
> > scheduler.py
>
> > ## schedule jobs
> > using
> >http://127.0.0.1:8000/scheduler/appadmin/insert/db/task_scheduled
>
> > ## monitor scheduled
> > jobs
>
> >http://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_sche...
>
> > ## view completed
> > jobs
> >http://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_run
>
> > Compared to celery it lacks the ability to bind tasks and workers ,
> > remotely interrupt tasks and set timeout, yet these features can be
> > added easily and I will so eventually.
>
> > Please let me know what you think.
>
> > Massimo
>
> --
>  http://martin.tecnodoc.com.ar


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-13 Thread Massimo Di Pierro
http://hartshorne.ca/2009/02/16/sqlite_database_is_locked/

On 13 Ago, 08:31, Niphlod  wrote:
> clearly with multiple threads maybe sqlite api is going to "wait" ,
> but with multiple processes trying to insert data at the same time I'm
> starting to think that can't keep up
>
> BTW, there's no way to come around the problem of task executed
> multiple times if I'm trying to assign workers in the same process
> that fetches and execute it
> Any ideas ?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-13 Thread Niphlod
clearly with multiple threads maybe sqlite api is going to "wait" ,
but with multiple processes trying to insert data at the same time I'm
starting to think that can't keep up

BTW, there's no way to come around the problem of task executed
multiple times if I'm trying to assign workers in the same process
that fetches and execute it
Any ideas ?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-13 Thread Massimo Di Pierro
added some try catch but this should not be happening. If database is
locked it should just wait.

Massimo

On 13 Ago, 07:07, Niphlod  wrote:
> I'm testing with postgres, anyway in sqlite this is the traceback from
> the workers
>
> Traceback (most recent call last):
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
> _bootstrap
>     self.run()
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in
> run
>     self._target(*self._args, **self._kwargs)
>   File "/home/niphlod/Scrivania/web2py/gluon/shell.py", line 211, in
> run
>     exec(python_code, _env)
>   File "", line 1, in 
>   File "/home/niphlod/Scrivania/web2py/gluon/scheduler.py", line 358,
> in worker_loop
>     while self.run_next_task(group_names=group_names): pass
>   File "/home/niphlod/Scrivania/web2py/gluon/scheduler.py", line 271,
> in run_next_task
>     result=dumps(result))
>   File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 5521, in
> update
>     return self.db._adapter.update(tablename,self.query,fields)
>   File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1024, in
> update
>     self.execute(sql)
>   File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1276, in
> execute
>     return self.log_execute(*a, **b)
>   File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1271, in
> log_execute
>     ret = self.cursor.execute(*a,**b)
> OperationalError: database is locked


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-13 Thread Niphlod
I'm testing with postgres, anyway in sqlite this is the traceback from
the workers

Traceback (most recent call last):
  File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
_bootstrap
self.run()
  File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in
run
self._target(*self._args, **self._kwargs)
  File "/home/niphlod/Scrivania/web2py/gluon/shell.py", line 211, in
run
exec(python_code, _env)
  File "", line 1, in 
  File "/home/niphlod/Scrivania/web2py/gluon/scheduler.py", line 358,
in worker_loop
while self.run_next_task(group_names=group_names): pass
  File "/home/niphlod/Scrivania/web2py/gluon/scheduler.py", line 271,
in run_next_task
result=dumps(result))
  File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 5521, in
update
return self.db._adapter.update(tablename,self.query,fields)
  File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1024, in
update
self.execute(sql)
  File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1276, in
execute
return self.log_execute(*a, **b)
  File "/home/niphlod/Scrivania/web2py/gluon/dal.py", line 1271, in
log_execute
ret = self.cursor.execute(*a,**b)
OperationalError: database is locked


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Massimo Di Pierro
Please check if the current trunk solves some of the problems.
The operational error are probably dues to failures to commit.
If you could reprot some traceback I can add some

try:
...
except: db.rollback()

in the right places.




On 12 Ago, 18:32, Niphlod  wrote:
> @Massimo: I switched over postgres before today's tests, I was facing
> too many Operational error with workers>4 and tasks queued>100
>
> PS: nice italian translation of the online book!


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Massimo Di Pierro
P.S. I always assumed people would run different nodes on different
machines and therefore different hostnames, but even in this case they
may get hostname equal to 127.0.0.1 so there must be a way to specify
a worker_name.

On 12 Ago, 18:35, Massimo Di Pierro 
wrote:
> I see the problem. All your workers have the same name because the
> worker_name defaults to hostanme:port of the web2py instance,
> The API have a way to specify a hostname but the command line function
> does not.
>
> There are two simple solutions:
> - specify a worker_name using shell arguments
> - have the worker pick up a UUID worker_name (this may create
> problems)
>
> meanwhile you can start the workers passing different -p  and this
> will fake different worker_names
>
> python web2py.py -K a0 -p 9001
> python web2py.py -K a0 -p 9002
> python web2py.py -K a0 -p 9003
>
> this should solve the problems.
> Your suggested trunk below should go in trunk but it does not solve
> the problem. Only makes it more rare.
>
> On 12 Ago, 18:26, Niphlod  wrote:
>
>
>
>
>
>
>
> > I'm trying some variations but it seems that the culprit is assigning
> > and retrieving task_scheduled in the same process.
>
> > I don't know dal internals with transactions, locking and commits... a
> > hint though (my 2 cents): I added, just to check, a line after 245
>
> > ...
> > if task:
> >     if task.assigned_worker_name != self.worker_name:
> >         logging.info('Someone stole my task!')
> >         return False
> >     logging.info('running task %s' % task.name)
> > ...
>
> > and it never gets actually printed.
> > So it's not a problem of "I assigned a task to me, and before it gets
> > executed another one picked that task", at least I think.
>
> > Right now it seems working ok only if
>
> > def assign_next_task_new(self, group_names=['main']):
> >         """
> >         find next task that needs to be executed
> >         """
> >         db = self.db
> >         row =
> > db(db.task_scheduled.assigned_worker_name==self.worker_name).select(limitby 
> > =(0,1)).first()
> >         return row
>
> > is used as a replacement for assign_next_task.
>
> > I don't know if it's viable to run a single "assigner" and several
> > workers. In python maybe the "assigner" could fetch the task_scheduled
> > in waiting state list (with a sane limitby clause) and split evenly
> > the list assigning to alive workers
>
> >http://stackoverflow.com/questions/312443/how-do-you-split-a-list-int...
>
> > For sql maniacs a simple script can be run as assigner (works only
> > with windowing functions, so check if your database supports it)
>
> > for postgres this works like a charm:
>
> > update task_scheduled
> > set assigned_worker_name = worker.name,
> > status='running',
> > last_run_time=now()
> > from
> >         (
> >         select ntile((select count(*)::int from worker_heartbeat)) OVER
> > (order by id) as t_id, *
> >         from task_scheduled
> >         WHERE 1 = 1
> >         AND status = 'queued'
> >         AND ((assigned_worker_name IS NULL) OR (assigned_worker_name = ''))
> >         ) sched,
> >         (
> >         select ntile((select count(*)::int from worker_heartbeat)) OVER
> > (order by id) as w_id, name
> >         from worker_heartbeat
> >         ) worker
> > WHERE worker.w_id = sched.t_id
> > and sched.id = task_scheduled.id
>
> > PS: I noticed another "fixable" aspect worker_heartbeat gets
> > polluted, it would be ok to eliminate the record when ctrl+c is
> > pressed (or process is killed gracefully)


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Massimo Di Pierro
I see the problem. All your workers have the same name because the
worker_name defaults to hostanme:port of the web2py instance,
The API have a way to specify a hostname but the command line function
does not.

There are two simple solutions:
- specify a worker_name using shell arguments
- have the worker pick up a UUID worker_name (this may create
problems)

meanwhile you can start the workers passing different -p  and this
will fake different worker_names

python web2py.py -K a0 -p 9001
python web2py.py -K a0 -p 9002
python web2py.py -K a0 -p 9003

this should solve the problems.
Your suggested trunk below should go in trunk but it does not solve
the problem. Only makes it more rare.


On 12 Ago, 18:26, Niphlod  wrote:
> I'm trying some variations but it seems that the culprit is assigning
> and retrieving task_scheduled in the same process.
>
> I don't know dal internals with transactions, locking and commits... a
> hint though (my 2 cents): I added, just to check, a line after 245
>
> ...
> if task:
>     if task.assigned_worker_name != self.worker_name:
>         logging.info('Someone stole my task!')
>         return False
>     logging.info('running task %s' % task.name)
> ...
>
> and it never gets actually printed.
> So it's not a problem of "I assigned a task to me, and before it gets
> executed another one picked that task", at least I think.
>
> Right now it seems working ok only if
>
> def assign_next_task_new(self, group_names=['main']):
>         """
>         find next task that needs to be executed
>         """
>         db = self.db
>         row =
> db(db.task_scheduled.assigned_worker_name==self.worker_name).select(limitby 
> =(0,1)).first()
>         return row
>
> is used as a replacement for assign_next_task.
>
> I don't know if it's viable to run a single "assigner" and several
> workers. In python maybe the "assigner" could fetch the task_scheduled
> in waiting state list (with a sane limitby clause) and split evenly
> the list assigning to alive workers
>
> http://stackoverflow.com/questions/312443/how-do-you-split-a-list-int...
>
> For sql maniacs a simple script can be run as assigner (works only
> with windowing functions, so check if your database supports it)
>
> for postgres this works like a charm:
>
> update task_scheduled
> set assigned_worker_name = worker.name,
> status='running',
> last_run_time=now()
> from
>         (
>         select ntile((select count(*)::int from worker_heartbeat)) OVER
> (order by id) as t_id, *
>         from task_scheduled
>         WHERE 1 = 1
>         AND status = 'queued'
>         AND ((assigned_worker_name IS NULL) OR (assigned_worker_name = ''))
>         ) sched,
>         (
>         select ntile((select count(*)::int from worker_heartbeat)) OVER
> (order by id) as w_id, name
>         from worker_heartbeat
>         ) worker
> WHERE worker.w_id = sched.t_id
> and sched.id = task_scheduled.id
>
> PS: I noticed another "fixable" aspect worker_heartbeat gets
> polluted, it would be ok to eliminate the record when ctrl+c is
> pressed (or process is killed gracefully)


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Niphlod
@Massimo: I switched over postgres before today's tests, I was facing
too many Operational error with workers>4 and tasks queued>100

PS: nice italian translation of the online book!


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Niphlod
I'm trying some variations but it seems that the culprit is assigning
and retrieving task_scheduled in the same process.

I don't know dal internals with transactions, locking and commits... a
hint though (my 2 cents): I added, just to check, a line after 245

...
if task:
if task.assigned_worker_name != self.worker_name:
logging.info('Someone stole my task!')
return False
logging.info('running task %s' % task.name)
...

and it never gets actually printed.
So it's not a problem of "I assigned a task to me, and before it gets
executed another one picked that task", at least I think.

Right now it seems working ok only if

def assign_next_task_new(self, group_names=['main']):
"""
find next task that needs to be executed
"""
db = self.db
row =
db(db.task_scheduled.assigned_worker_name==self.worker_name).select(limitby=(0,1)).first()
return row

is used as a replacement for assign_next_task.

I don't know if it's viable to run a single "assigner" and several
workers. In python maybe the "assigner" could fetch the task_scheduled
in waiting state list (with a sane limitby clause) and split evenly
the list assigning to alive workers

http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python

For sql maniacs a simple script can be run as assigner (works only
with windowing functions, so check if your database supports it)

for postgres this works like a charm:

update task_scheduled
set assigned_worker_name = worker.name,
status='running',
last_run_time=now()
from
(
select ntile((select count(*)::int from worker_heartbeat)) OVER
(order by id) as t_id, *
from task_scheduled
WHERE 1 = 1
AND status = 'queued'
AND ((assigned_worker_name IS NULL) OR (assigned_worker_name = ''))
) sched,
(
select ntile((select count(*)::int from worker_heartbeat)) OVER
(order by id) as w_id, name
from worker_heartbeat
) worker
WHERE worker.w_id = sched.t_id
and sched.id = task_scheduled.id

PS: I noticed another "fixable" aspect worker_heartbeat gets
polluted, it would be ok to eliminate the record when ctrl+c is
pressed (or process is killed gracefully)


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Massimo Di Pierro
This is strange. This must be a problem with transactions in sqlite.
Will run more tests tomorrow.

On 12 Ago, 15:11, Niphlod  wrote:
> Actually 2 workers end up with fetching the same task_scheduled also
> with the new logic.
>
> Reproducing is as simple as, in a controller :
>
> def submit_work():
>     from gluon.contrib.simplejson import loads,dumps
>     db(db.task_scheduled.id>0).delete() #cleanup, we want "unique"
> values in a
>     for a in range(1000):
>         id = scheduler.db.task_scheduled.insert(
>     name = 'a',
>     func = 'demo1',
>     args = dumps(['test', a]),
>     vars = dumps({'test': 'test2'})
>     )
>     return '%s' % (id)
>
> def verify_work_done():
>     count = db.task_run.id.count()
>     result = db().select(db.task_run.output, count, groupby =
> db.task_run.output, having=count>1)
>     return dict(res=result)
>
> with testing app a0 as in the examples, where
>
> def demo1(*args,**vars):
>     print 'you passed args=%s and vars=%s' % (args, vars)
>     return 'done!'
>
> Hit submit_work(), start 2 or more workers, wait for finishing up
> task_scheduled, then hit verify_work_done().
>
> Several records returned, not good :P


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Niphlod
Actually 2 workers end up with fetching the same task_scheduled also
with the new logic.

Reproducing is as simple as, in a controller :

def submit_work():
from gluon.contrib.simplejson import loads,dumps
db(db.task_scheduled.id>0).delete() #cleanup, we want "unique"
values in a
for a in range(1000):
id = scheduler.db.task_scheduled.insert(
name = 'a',
func = 'demo1',
args = dumps(['test', a]),
vars = dumps({'test': 'test2'})
)
return '%s' % (id)


def verify_work_done():
count = db.task_run.id.count()
result = db().select(db.task_run.output, count, groupby =
db.task_run.output, having=count>1)
return dict(res=result)

with testing app a0 as in the examples, where

def demo1(*args,**vars):
print 'you passed args=%s and vars=%s' % (args, vars)
return 'done!'

Hit submit_work(), start 2 or more workers, wait for finishing up
task_scheduled, then hit verify_work_done().

Several records returned, not good :P


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-12 Thread Niphlod
I'm continuing here if ok to report bugs tell me if I have to stop
or go somewhere else...

if one starts different processes , say,

python web2py.py -K a0
python web2py.py -K a0
python web2py.py -K a0
python web2py.py -K a0
python web2py.py -K a0

the guess_worker_method actually returns the same value, then
worker_name and the new "logic" doesn't apply. Better to figure out a
worker that is different if yet found running in worker_hearbeat... or
assigning it directly a uuid on start.

Another small problem: fix_failures() goes on error if ids == []. a
simple len()>0 check before the subsequent update fix the problem.

/me goes to testing.


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread pbreit
Are you referring to Web2py's new scheduler?

Since many of us are already set up to use SQLite or MySQL/Postgres, could 
those be used in place of Redis?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread TheSweetlink
Long time lurker here.  First, thank you all for this incredible
product and community.  I have learned a great deal in a very short
order of time thanks to web2py and this mailing list.

I may have the opportunity to finally contribute back meaningfully.

Regarding background message/task queuing I have had great success
with a very simple combination that has many extra side benefits.  All
dependencies (BSON, redis-py, hotqueue) are available via pip for easy
installation.  Some pros and cons listed below.

The short version:
Serialize your task/message into a dict, list of dicts, whatever via
BSON (Binary JSON = FAST),
push it to a queue kept in Redis,
background worker pulls from queue and processes it/inserts into db/
whatever you like.

Not only messages but I have successfully tested this as a queue for
tasks as well.  Use a k,v pair in a dict describing your {'action':
'data'}.  It's easy to get creative with it.

Cons:
- More dependencies if you don't already run Redis.
- Time to familiarize yourself with Redis.  (15 - 20 minutes maybe)
- No scheduling built-in but uwsgi decorators can provide cron
capabilities for those running uwsgi.  I haven't played with them yet
but intend to over the next month or two.

Benefits:
- Still fewer dependencies than celery.  Redis becomes the task/
message queue as well as the back end storage and web2py workers
process them.
- Multiple dbs in Redis can be organized into different queues.
- Flexible - Can be adapted to your project fairly easily.
- Also can act as a cache.  Redis has some really useful data types
which you can use to do a lot of your processing in memory.
- Has a pub/sub system which can also be incorporated into your
queue.  Perhaps as a central logging server.
- All data in Redis is kept in memory but persisted to disk = blazing
fast performance with durability.
- Possible simplified application architecture if you use Redis for
your task queue, async message/db insert queue, pub/sub system

I will be happy to create a slice detailing the process with some
abstract examples if anyone has interest in my solution.

Thanks again for all your work.  I intend to contribute back to web2py
as much as I can and all open source because I have learned and built
so much with it.

-David Bloom


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread Massimo Di Pierro
PS. The fix also allow to schedule a one time task to run on a
specific worker which is useful in its own.

On Aug 11, 7:33 pm, Massimo Di Pierro 
wrote:
> You found a major bug. I think it is not fixed in trunk (more or less
> as you suggested). Please check it.
>
> The database locked issue should not raise, perhaps there is a sqlite
> flag.
>
> Massimo
>
> On Aug 11, 6:33 pm, Niphlod  wrote:
>
>
>
>
>
>
>
> > Really totally insane rambling.
> > Reference in task_run doesn't block worker two executing the
> > function...
>
> > So, another rambling:
> > In order to avoid it, at the cost of firing two queries on
> > task_scheduled table, wouldn't be better to "sign" the task_scheduled
> > with, say, worker_heartbeat.id and then check, after the first commit,
> > if the same record is being taken up by another worker?
>
> > Something like:
>
> > 1. worker 1 fetches task_scheduled.id=1 and updates the record with
> > worker_id=1 and status=running
> > 2. commit()
> > 3. worker 1 re-fetches task_scheduled.id=1 and checks if worker_id is
> > still == 1
> > 4. task_run record is inserted
> > 5. db.commit()
>
> > and so on...
>
> > would this kind of approach seal the deal or it's a "chicken and egg"
> > problem and we'd need to separate delicate processes "intelligently"
> > using task_scheduled.group_name and only a worker doing those ?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread Massimo Di Pierro
You found a major bug. I think it is not fixed in trunk (more or less
as you suggested). Please check it.

The database locked issue should not raise, perhaps there is a sqlite
flag.

Massimo

On Aug 11, 6:33 pm, Niphlod  wrote:
> Really totally insane rambling.
> Reference in task_run doesn't block worker two executing the
> function...
>
> So, another rambling:
> In order to avoid it, at the cost of firing two queries on
> task_scheduled table, wouldn't be better to "sign" the task_scheduled
> with, say, worker_heartbeat.id and then check, after the first commit,
> if the same record is being taken up by another worker?
>
> Something like:
>
> 1. worker 1 fetches task_scheduled.id=1 and updates the record with
> worker_id=1 and status=running
> 2. commit()
> 3. worker 1 re-fetches task_scheduled.id=1 and checks if worker_id is
> still == 1
> 4. task_run record is inserted
> 5. db.commit()
>
> and so on...
>
> would this kind of approach seal the deal or it's a "chicken and egg"
> problem and we'd need to separate delicate processes "intelligently"
> using task_scheduled.group_name and only a worker doing those ?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread Niphlod
Really totally insane rambling.
Reference in task_run doesn't block worker two executing the
function...

So, another rambling:
In order to avoid it, at the cost of firing two queries on
task_scheduled table, wouldn't be better to "sign" the task_scheduled
with, say, worker_heartbeat.id and then check, after the first commit,
if the same record is being taken up by another worker?

Something like:

1. worker 1 fetches task_scheduled.id=1 and updates the record with
worker_id=1 and status=running
2. commit()
3. worker 1 re-fetches task_scheduled.id=1 and checks if worker_id is
still == 1
4. task_run record is inserted
5. db.commit()

and so on...

would this kind of approach seal the deal or it's a "chicken and egg"
problem and we'd need to separate delicate processes "intelligently"
using task_scheduled.group_name and only a worker doing those ?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-11 Thread Niphlod
scheduler.py are a super-arci-mega-ultra-extra nice 400 lines of code
that I like...

right now I'm beginning to test it, and the basics are quite
understandable at a first glance.

Before doing anything advanced, I fired up 8 workers and write a
controller putting an insanely high number of small function...noticed
a little hang on next_task() and think I found the resolution, that
doesn't need a big change... line 224

return
db(query).select(orderby=db.task_scheduled.next_run_time).first()

selects all rows, serialize it and then take only the first one. with
10 records on task_scheduled table, and 8 workers polling, it
becomes a little heavy. Transforming that line in

return db(query).select(limitby=(0,1),
orderby=db.task_scheduled.next_run_time).first()

made it run (obviuosly) smoother.

I'll be happy to fully test this new scheduler because right now I'm
forced to launch a relatively long function in a controller with ajax
call really suboptimal.

Supposedly I'd like to have several tasks with no repetitions, so
several task_scheduled going to create a lot of task_run.
I need to save some data to the application database in the function
itself, so task_run isn't really needed.
I'd need to use the cleanup functions in scheduler.py, but I didn't
get how to add them using the

scheduler = Scheduler(db,dict(demo1=demo1,demo2=demo2))

syntax.

Also, I didn't get how to start scheduler.py in "standalone" mode,
specifically creating the tasks.py file ... any hint ?

PS: on line 47 of usage "id =
scheduler.db.tast_scheduler.insert()" is really "id =
scheduler.db.task_scheduled.insert()"

One question (here in Italy I'd say "non voglio rompere le uova nel
paniere", it seems that in english the correct translation is "I don't
want to to cut the ground from under ") ...
using sqlite is a mess of "operational error: Database is locked" with
many workers, so I went to test it with Postgres
One thing that blocked me from writing an async queue and the relative
worker on a database was that, in scheduler.py terms, next_task()
fired from different workers fetch the same record.
While this particular occasion is rare, for some operations (e.g.:
send out a single mail once a day to a user, or having to schedule a
function that takes a loong time to execute) is a pain in the ass.
With one worker all is going fine, obviously.
That's why redis or rabbitMQ (just to name the first two coming up on
google searches) are used to store scheduled tasks: they are designed
to pull the record off the task_scheduled istantly, assuring that the
task is effectively completed only once.
So, here's what I thought (and I'll try to reproduce as soon as
possible):

1. next_task() fetches a task_scheduled record
2. then a record is inserted into task_run
3. then task_scheduled record gets updated
4. there's a db.commit()
5. function runs
etc etc

following updates and commits are not important to this matter.
>From 1. to just before 4., if another worker pulls a task with
next_task(), the task fetched is the same.
Relational databases comes to help, that's because smart Massimo put a
reference in task_run to task_scheduled, so, for example, if two
workers fetch the same record, they are not allowed at database level
to start working, because the db.commit() on 4. works for the first
but not for the second worker.
Second worker will crash, and it'll stop working.

If this assumption is right, and this is not a total insane rambling
will it be safer to catch that exception and continue working,
fetching another record, instead of crashing ?

I came up also with different ideas for a workaround, but I'll stop
here if the upper mentioned part is actually insane :D


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-10 Thread Massimo Di Pierro
simplejson can be extended and it can call for example np.save. The
problem is that the json protocol does not allow for cusom de-
serialization of arbitrary types. I think it is best to serialize
before passing the data to json and have the tasks do de-
serialization.

Currently scheduler.py does not have a hook to pass a link to
extensions but it is on my todo list. It should not be an issue.

On Aug 10, 2:40 pm, G  wrote:
> Massimo,
> The scheduler.py is just what I needed for my application. The one
> problem I have is that in most of my work, the data type I pass around
> everywhere is a NumPy array, which I have found cannot be serialized
> by SimpleJSON. Numpy has it's own serialization which works very well.
> It seems like it might be easiest to extend SimpleJSON to first
> serialize a numpy array to a binary string using np.save and then pass
> the string to the JSON encoder. Will this work for the binary strings?
> The other issue is how to decide on the receiving end if the string
> should be passed to np.load.
>
> I appreciate any ideas.
> Thanks,
> G
>
> On Aug 8, 7:28 am, Massimo Di Pierro 
> wrote:
>
>
>
>
>
>
>
> > ## preambole
>
> > I have been working on porting django-celery to 
> > web2py-celery.http://code.google.com/p/web2py-celery
> > There are a few issues to resolve and I am working on it.
>
> > Yet I found it to be overkill for most users. It has lots of
> > dependencies (for example RabbitMQ) and it is not easy to manage. If
> > you do not need a huge number of worker nodes there may be a better
> > solution.
>
> > So  I added this to trunk:
>
> > gluon/scheduler.py
>
> > This email is a request for comments as I think this should replace te
> > current cron mechanism.
>
> > ## What is it?
> > It is a lightweight replacement for celery that uses the database
> > instead of queues to schedule tasks and uses the default web2py admin
> > interface to allow you to schedule tasks. It consists of a single file
> > and has no dependencies.
>
> > ## How does it work?
>
> > For any existing
> > app
>
> > Create File: app/models/scheduler.py
> > ==
> > from gluon.scheduler import
> > Scheduler
>
> > def
> > demo1(*args,**vars):
> >     print 'you passed args=%s and vars=%s' % (args,
> > vars)
> >     return
> > 'done!'
>
> > def
> > demo2():
>
> > 1/0
>
> > scheduler =
> > Scheduler(db,dict(demo1=demo1,demo2=demo2))
> > =
>
> > Create File: app/modules/scheduler.py
> > ==
> > scheduler.worker_loop()
> > =
>
> > ## run worker nodes
> > with:
> > python web2py.py -S app -M -N -R applications/app/modules/
> > scheduler.py
>
> > ## schedule jobs
> > usinghttp://127.0.0.1:8000/scheduler/appadmin/insert/db/task_scheduled
>
> > ## monitor scheduled
> > jobshttp://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_sche...
>
> > ## view completed
> > jobshttp://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_run
>
> > Compared to celery it lacks the ability to bind tasks and workers ,
> > remotely interrupt tasks and set timeout, yet these features can be
> > added easily and I will so eventually.
>
> > Please let me know what you think.
>
> > Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-10 Thread G
Massimo,
The scheduler.py is just what I needed for my application. The one
problem I have is that in most of my work, the data type I pass around
everywhere is a NumPy array, which I have found cannot be serialized
by SimpleJSON. Numpy has it's own serialization which works very well.
It seems like it might be easiest to extend SimpleJSON to first
serialize a numpy array to a binary string using np.save and then pass
the string to the JSON encoder. Will this work for the binary strings?
The other issue is how to decide on the receiving end if the string
should be passed to np.load.

I appreciate any ideas.
Thanks,
G

On Aug 8, 7:28 am, Massimo Di Pierro 
wrote:
> ## preambole
>
> I have been working on porting django-celery to 
> web2py-celery.http://code.google.com/p/web2py-celery
> There are a few issues to resolve and I am working on it.
>
> Yet I found it to be overkill for most users. It has lots of
> dependencies (for example RabbitMQ) and it is not easy to manage. If
> you do not need a huge number of worker nodes there may be a better
> solution.
>
> So  I added this to trunk:
>
> gluon/scheduler.py
>
> This email is a request for comments as I think this should replace te
> current cron mechanism.
>
> ## What is it?
> It is a lightweight replacement for celery that uses the database
> instead of queues to schedule tasks and uses the default web2py admin
> interface to allow you to schedule tasks. It consists of a single file
> and has no dependencies.
>
> ## How does it work?
>
> For any existing
> app
>
> Create File: app/models/scheduler.py
> ==
> from gluon.scheduler import
> Scheduler
>
> def
> demo1(*args,**vars):
>     print 'you passed args=%s and vars=%s' % (args,
> vars)
>     return
> 'done!'
>
> def
> demo2():
>
> 1/0
>
> scheduler =
> Scheduler(db,dict(demo1=demo1,demo2=demo2))
> =
>
> Create File: app/modules/scheduler.py
> ==
> scheduler.worker_loop()
> =
>
> ## run worker nodes
> with:
> python web2py.py -S app -M -N -R applications/app/modules/
> scheduler.py
>
> ## schedule jobs
> usinghttp://127.0.0.1:8000/scheduler/appadmin/insert/db/task_scheduled
>
> ## monitor scheduled
> jobshttp://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_sche...
>
> ## view completed
> jobshttp://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_run
>
> Compared to celery it lacks the ability to bind tasks and workers ,
> remotely interrupt tasks and set timeout, yet these features can be
> added easily and I will so eventually.
>
> Please let me know what you think.
>
> Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-09 Thread Massimo Di Pierro
On Aug 8, 9:07 pm, Andrew  wrote:
> I don't know anything about Celery, but I am interested in scheduling
> functionality for web2py.  When you mention it doesn't have
> dependencies, I assume you mean the code itself.
>
> One of my next tasks was to look at open source schedulers,
> particularly the ability to do Job/tasks dependencies, in that a
> "start" job is kicked off by the cron, that then initiates a stream of
> work with multiple jobs, sometimes parallel, sometiems serial.   Is
> this possible ?

Yes. Basically that is what it is for.
You create a record task_scheduled which tells web2py which function
you want to run, its arguments in json, and when it should runs (once,
twice, 1000 times, starting now, starting later, every day, etc.) and
worker nodes will pick up the tasks when scheduled and do it. Right
now it uses admin as a web interface and it works fine but we could
come up with something sleeker.

It is really easy to try. I will try make a vide about it.

>
> I see a database driven approach as a good thing.  It may perhaps be a
> little slower but the scheduler overhead can be such a small component
> of the overall workload.  (My target app is Data Integration).
>
> Would love to know more and to give it a try.
>
> thanks
> Andrew
>
> On Aug 9, 12:12 pm, Massimo Di Pierro 
> wrote:
>
>
>
>
>
>
>
> > All you need is to follow the example and run the worker task (the
> > background task).
>
> > Whatever web server you use, your web2py apps will be able to queue
> > and schedule tasks using admin.
>
> > BTW... download it again as I added some stuff.
>
> > On Aug 8, 5:46 pm, David J  wrote:
>
> > > How do use this from uwsgi? Currently I am launching the web2py app using
> > > the web2py uswgi handler
> > > On Aug 8, 2011 6:35 PM, "Massimo Di Pierro" 
> > > wrote:
>
> > > > for now just set an insame large value for repeats. We could agree to
> > > > use -1.
>
> > > > On Aug 8, 12:12 pm, Marin Pranjic  wrote:
> > > >> It looks good.
> > > >> How to add a task which will repeat infinite times?
> > > >> What are Start time and Stop time used for? Just to clarify...
>
> > > >> On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
>
> > > >>  wrote:
>
> > > >> > Please let me know what you think.
>
> > > >> > Massimo- Hide quoted text -
>
> > - Show quoted text -


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Andrew
I don't know anything about Celery, but I am interested in scheduling
functionality for web2py.  When you mention it doesn't have
dependencies, I assume you mean the code itself.

One of my next tasks was to look at open source schedulers,
particularly the ability to do Job/tasks dependencies, in that a
"start" job is kicked off by the cron, that then initiates a stream of
work with multiple jobs, sometimes parallel, sometiems serial.   Is
this possible ?

I see a database driven approach as a good thing.  It may perhaps be a
little slower but the scheduler overhead can be such a small component
of the overall workload.  (My target app is Data Integration).

Would love to know more and to give it a try.

thanks
Andrew



On Aug 9, 12:12 pm, Massimo Di Pierro 
wrote:
> All you need is to follow the example and run the worker task (the
> background task).
>
> Whatever web server you use, your web2py apps will be able to queue
> and schedule tasks using admin.
>
> BTW... download it again as I added some stuff.
>
> On Aug 8, 5:46 pm, David J  wrote:
>
>
>
> > How do use this from uwsgi? Currently I am launching the web2py app using
> > the web2py uswgi handler
> > On Aug 8, 2011 6:35 PM, "Massimo Di Pierro" 
> > wrote:
>
> > > for now just set an insame large value for repeats. We could agree to
> > > use -1.
>
> > > On Aug 8, 12:12 pm, Marin Pranjic  wrote:
> > >> It looks good.
> > >> How to add a task which will repeat infinite times?
> > >> What are Start time and Stop time used for? Just to clarify...
>
> > >> On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
>
> > >>  wrote:
>
> > >> > Please let me know what you think.
>
> > >> > Massimo- Hide quoted text -
>
> - Show quoted text -


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Massimo Di Pierro
All you need is to follow the example and run the worker task (the
background task).

Whatever web server you use, your web2py apps will be able to queue
and schedule tasks using admin.

BTW... download it again as I added some stuff.

On Aug 8, 5:46 pm, David J  wrote:
> How do use this from uwsgi? Currently I am launching the web2py app using
> the web2py uswgi handler
> On Aug 8, 2011 6:35 PM, "Massimo Di Pierro" 
> wrote:
>
>
>
>
>
>
>
> > for now just set an insame large value for repeats. We could agree to
> > use -1.
>
> > On Aug 8, 12:12 pm, Marin Pranjic  wrote:
> >> It looks good.
> >> How to add a task which will repeat infinite times?
> >> What are Start time and Stop time used for? Just to clarify...
>
> >> On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
>
> >>  wrote:
>
> >> > Please let me know what you think.
>
> >> > Massimo


Re: [web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread David J
How do use this from uwsgi? Currently I am launching the web2py app using
the web2py uswgi handler
On Aug 8, 2011 6:35 PM, "Massimo Di Pierro" 
wrote:
> for now just set an insame large value for repeats. We could agree to
> use -1.
>
> On Aug 8, 12:12 pm, Marin Pranjic  wrote:
>> It looks good.
>> How to add a task which will repeat infinite times?
>> What are Start time and Stop time used for? Just to clarify...
>>
>> On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
>>
>>
>>
>>
>>
>>
>>
>>  wrote:
>>
>> > Please let me know what you think.
>>
>> > Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Massimo Di Pierro
for now just set an insame large value for repeats. We could agree to
use -1.

On Aug 8, 12:12 pm, Marin Pranjic  wrote:
> It looks good.
> How to add a task which will repeat infinite times?
> What are Start time and Stop time used for? Just to clarify...
>
> On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
>
>
>
>
>
>
>
>  wrote:
>
> > Please let me know what you think.
>
> > Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Massimo Di Pierro
On Aug 8, 11:55 am, pbreit  wrote:
> I definitely like the idea of something simpler. Even though Celery is
> pitched as somewhat easy, I could never make heads or tails of it. I look
> forward to giving this a try.
>
> What are the web2py dependencies? Do you foresee bundling DAL and whatever
> to make it standalone?

In only needs dal.py and globals.py. Could be used standalone, it
would need a main() and I may build it later today. Should not take
much.

> Is SQLite a reasonable DB or will this likely need something that works
> better with concurrency?

If running the tasks take longer than retrieving the task (as it
should be else there is no reason for using this), the db access is
not an issue.

> What is the mechanism to start the scheduler, start on reboot, monitor it,
> etc?

You just need to start web2py and start the background process. There
is nothing else to do. There are some difference from celery.

In celery the celerybeat deamon pushes tasks to the celeryd services
(workers). In gluon/scheduler.py the background processes (workers)
pull the tasks from the database. There is no deamon dealing with
scheduling.

There are three tables.
* task_scheduler stores a list of task, when you want to run
(next_run_time), how ofter (perdiod), how many times (repeats,
times_run), within what time frame (start_time, stop_time), the max
timeout, etc.
* task_run stores the output of each task run. One task_scheduled with
repeats=10 will generate 10 task_run records.
* worker_heartbeat stores the heartbeat of the workers, i.e. the time
when they poll for tasks. Each task_schedule can be:
- queued (waiting to be picked up)
- running (task was picked by a worker)
- completed (was run as many times as requested)
- failed (the task failed and will not be run again)
- overdue (the task has not reported, probably a worker has died in
the middle of it, should not happen under normal conditions.)
A task that does not fail and is schedule to run 3 times will go
through:
queued -> running -> queued -> running -> queued -> running ->
completed
They only run if they are queued and are due to run.

This design allows you to do what you to do what you normally with
cron but with some differences:
- cron is at the web2py level; gluon/scheduler.py is at the app level
(although some apps may share a scheduler)
- cron spawns a process for each task and this created problems to
some users. gluon/scheduler.py runs tasks sequentially in a fixed
number of processes (one in the example).
- tasks can be managed from the admin interface (schedule, start,
stop, restart, change input, read output, etc).
- the same task cannot overlap with itself therefore it is easier to
manage
- tasks are not executed exactly when due, but as close as possible,
in a FIFO order based on the requested schedule and the workload and
resources available. More like celery than cron.

Hope this makes sense.

gluon/scheduler.py is 170 lines of code and you may want to take a
look at what it does.

Massimo




[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Ross Peoples
Great questions from pbreit. I also have a quick one: Would it be easy for 
plugins to use the Scheduler? Imagine how many cool plugins would be 
possible (mail queues, session cleanups, statistical analysis, image 
manipulation, etc.) with a standardized web2py scheduler.

[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread pbreit
I definitely like the idea of something simpler. Even though Celery is 
pitched as somewhat easy, I could never make heads or tails of it. I look 
forward to giving this a try.

What are the web2py dependencies? Do you foresee bundling DAL and whatever 
to make it standalone?

Is SQLite a reasonable DB or will this likely need something that works 
better with concurrency?

What is the mechanism to start the scheduler, start on reboot, monitor it, 
etc?


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Massimo Di Pierro
On Aug 8, 9:59 am, Fran  wrote:
> >  It has lots of dependencies (for example RabbitMQ)
>
> This is a little unfair - whilst RabbitMQ is the recommended Broker for
> Production systems, there are many other supported & porting django-kombu to
> web2py-kombu shouldn't be hard.

I agree.
Our private discussions on the topic helped me understand better how
celery works and what the needs of some users are.
celery is great and django-celery ()ported to web2py) is fantastic
too. Yet I found out that IF the communication between broker and
workers is not a bottle neck (and often is not) and if you want to use
django (or our case web2py) to schedule tasks, tasks have to be stored
in database. So for small number of workers it is not a big overhead
if they pick the tasks directly form the database. This is an enormous
simplification.

Yet is it so much simpler that it may actually scale well for many
practical apps where you may have a few tasks/minute or less.

The db used for the scheduler does not need to be the same used as
main db thus reducing the load.

Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Massimo Di Pierro
P.S. I added task groups and I added task timeout

On Aug 8, 10:00 am, Ross Peoples  wrote:
> I certainly like the minimalistic one file approach. Talking to the tasks
> seems like a pretty important feature. I would certainly love to see this in
> action. Will the scheduler be started automatically, like the built-in cron,
> or would you need to start it manually (like from external cron)?

currently, my understanding is, that celery does not allow talking to
the tasks, only stopping and killing tem. This is implemented using OS
signals. It is possible to add this feature in the current
implementation by sending and catching os signals. I will wait that
this is tested more before I implement it.

Massimo


[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Ross Peoples
I certainly like the minimalistic one file approach. Talking to the tasks 
seems like a pretty important feature. I would certainly love to see this in 
action. Will the scheduler be started automatically, like the built-in cron, 
or would you need to start it manually (like from external cron)?

[web2py] Re: IMPORTANT on cron jobs, scheduled jobs and delayed jobs

2011-08-08 Thread Fran
Looks very interesting :)

This appears to be an easier way to meet most of the needs for which I was 
investigating Celery, since I'm not looking at massively-scalable systems, 
but rather just a way to have longer-running requests (e.g. onaccepts) 
pushed asynchronously to givers users a more responsive system & avoid 
browser timeouts. & also want a way to easily build a graphical scheduler 
which works even on Win32 service mode (which the current cron doesn't)

>  It has lots of dependencies (for example RabbitMQ)

This is a little unfair - whilst RabbitMQ is the recommended Broker for 
Production systems, there are many other supported & porting django-kombu to 
web2py-kombu shouldn't be hard.

>Compared to celery it lacks the ability to bind tasks and workers , 

This isn't important to me

> remotely interrupt tasks and set timeout

These would indeed be useful in time :)

> Please let me know what you think.

I'll give it a try :)

Many thanks,
Fran.