[web2py] Re: A basic problem about threading and time consuming function...
Debian/nginx/systemd deployment: I made scheduler working with help: https://groups.google.com/d/msg/web2py/eHXwines4o0/i3WqDlKjCQAJ and https://groups.google.com/d/msg/web2py/jFWNnz5cl9U/UpBSkxf4_2kJ Thank you very much Niphlod, Michael M, Brian M Dne čtvrtek 5. května 2016 13:06:04 UTC+2 Mirek Zvolský napsal(a): > > Yes. > I run with scheduler already. It is really nice and great ! > Going away from the ajax solution it was easy and there was almost no > problem. (I have very easy parameters for the task and I return nothing, > just I save into db.) > The result code is cleaner (one task starting call instead of rendering > hidden html element + js reading from it + ajax call + parsing args). > > Maybe my previous mistake (I mean my message here in this thread) will be > helpfull for others to go with scheduler. > > What I need to do now is deployment for the scheduler (on Debian and > nginx). > > PS: > It was fast but important > - find where I can see code errors (in schedulers db tables), > - how to set timeout (in the function call) > > Here is the code example - controller and models/scheduler.py: > def find(): > def onvalidation(form): > form.vars.asked = datetime.datetime.utcnow() > form = SQLFORM(db.question) > if form.process(onvalidation=onvalidation).accepted: > scheduler.queue_task(task_catalogize, > pvars={'question_id': form.vars.id, 'question': > form.vars.question, 'asked': str(form.vars.asked)}, # str to json > serialize datetime > timeout=300) > return dict(form=form) > > import datetime > from gluon.scheduler import Scheduler > def task_catalogize(question_id, question, asked): > asked = datetime.datetime.strptime(asked, '%Y-%m-%d %H:%M:%S.%f') # > deserialize datetime > inserted = some_db_actions(question) > db.question[question_id] = { > 'duration': round((datetime.datetime.utcnow() - > asked).total_seconds(), 0), # same/similar we have in scheduler db > tables > 'inserted': inserted} > db.commit() > scheduler = Scheduler(db) > > > > Dne úterý 3. května 2016 14:21:23 UTC+2 Niphlod napsal(a): >> >> NP: as everything it's not the silver bullet but with the redis >> incarnation I'm sure you can achieve less than 3 second (if you tune >> heartbeat even less than 1 second) from when the task gets queued to when >> it gets processed. >> >> On Tuesday, May 3, 2016 at 12:32:13 PM UTC+2, Mirek Zvolský wrote: >>> >>> Hi, Niphlod. >>> >>> After I have read something about scheduler, >>> I am definitively sorry for my previous notes >>> and I choose web2py scheduler of course. >>> >>> It will be my first use of it (with much older ~3 years web2py app I >>> have used cron only), >>> so it will take some time to learn with scheduler. But it is sure worth >>> to redesign it so. >>> >>> Thanks you are patient with me. >>> Mirek >>> >>> >>> >>> >>> Dne pondělí 2. května 2016 20:35:05 UTC+2 Mirek Zvolský napsal(a): You are right. At this time it works for me via ajax well and I will look carefully for problems. If so, I will move to scheduler. I see this is exactly what Massimo(?) writes at the bottom of Ajax chapter of the book. PS: about times: At notebook with mobile connection it takes 20-40s. So it could be danger. At cloud server with SSD it takes 2-10s. But this will be my case. And I feel better when the user can have typical response in 3s instead in 8s. Dne neděle 1. května 2016 22:10:31 UTC+2 Niphlod napsal(a): > > the statement "I don't need to use the scheduler, because I want to > start it as soon as possible" is flaky at best. If your "fetching" varies > from 2 to 20 seconds and COULD extend further to 60 seconds, waiting a > few > seconds for the scheduler to start the process is uhm... debatable. > Of course relying to ajax if your "feching" can be killed in the > process is the only other way. > > On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: >> >> Thanks for info and tips, 6 years later. >> >> What I try to do >> is a form with single input, where user gives a query string >> and then data about (usually ~300) books will be retrieved via z39 >> and marc protocol/format, parsed and saved into local database. >> >> Of course this will take a time (2? 5? 20? seconds) and I decided >> not to show the result immediately, >> but show the same form with possibility to enter the next query + >> there is a list of pending queries (and their status - via ajax testing >> every 5 seconds) >> >> So my idea was to provide a return from the controller fast and >> before the return to start a new thread to retrieve/parse/save/commit >> data. >> >> From this discussion I understand that open new thread isn't best >>
[web2py] Re: A basic problem about threading and time consuming function...
Yes. I run with scheduler already. It is really nice and great ! Going away from the ajax solution it was easy and there was almost no problem. (I have very easy parameters for the task and I return nothing, just I save into db.) The result code is cleaner (one task starting call instead of rendering hidden html element + js reading from it + ajax call + parsing args). Maybe my previous mistake (I mean my message here in this thread) will be helpfull for others to go with scheduler. What I need to do now is deployment for the scheduler (on Debian and nginx). PS: It was fast but important - find where I can see code errors (in schedulers db tables), - how to set timeout (in the function call) Here is the code example - controller and models/scheduler.py: def find(): def onvalidation(form): form.vars.asked = datetime.datetime.utcnow() form = SQLFORM(db.question) if form.process(onvalidation=onvalidation).accepted: scheduler.queue_task(task_catalogize, pvars={'question_id': form.vars.id, 'question': form.vars.question, 'asked': str(form.vars.asked)}, # str to json serialize datetime timeout=300) return dict(form=form) import datetime from gluon.scheduler import Scheduler def task_catalogize(question_id, question, asked): asked = datetime.datetime.strptime(asked, '%Y-%m-%d %H:%M:%S.%f') # deserialize datetime inserted = some_db_actions(question) db.question[question_id] = { 'duration': round((datetime.datetime.utcnow() - asked).total_seconds(), 0), # same/similar we have in scheduler db tables 'inserted': inserted} db.commit() scheduler = Scheduler(db) Dne úterý 3. května 2016 14:21:23 UTC+2 Niphlod napsal(a): > > NP: as everything it's not the silver bullet but with the redis > incarnation I'm sure you can achieve less than 3 second (if you tune > heartbeat even less than 1 second) from when the task gets queued to when > it gets processed. > > On Tuesday, May 3, 2016 at 12:32:13 PM UTC+2, Mirek Zvolský wrote: >> >> Hi, Niphlod. >> >> After I have read something about scheduler, >> I am definitively sorry for my previous notes >> and I choose web2py scheduler of course. >> >> It will be my first use of it (with much older ~3 years web2py app I have >> used cron only), >> so it will take some time to learn with scheduler. But it is sure worth >> to redesign it so. >> >> Thanks you are patient with me. >> Mirek >> >> >> >> >> Dne pondělí 2. května 2016 20:35:05 UTC+2 Mirek Zvolský napsal(a): >>> >>> You are right. >>> At this time it works for me via ajax well and I will look carefully for >>> problems. >>> If so, I will move to scheduler. >>> >>> I see this is exactly what Massimo(?) writes at the bottom of Ajax >>> chapter of the book. >>> >>> PS: about times: >>> At notebook with mobile connection it takes 20-40s. So it could be >>> danger. >>> At cloud server with SSD it takes 2-10s. But this will be my case. And I >>> feel better when the user can have typical response in 3s instead in 8s. >>> >>> >>> >>> >>> >>> Dne neděle 1. května 2016 22:10:31 UTC+2 Niphlod napsal(a): the statement "I don't need to use the scheduler, because I want to start it as soon as possible" is flaky at best. If your "fetching" varies from 2 to 20 seconds and COULD extend further to 60 seconds, waiting a few seconds for the scheduler to start the process is uhm... debatable. Of course relying to ajax if your "feching" can be killed in the process is the only other way. On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: > > Thanks for info and tips, 6 years later. > > What I try to do > is a form with single input, where user gives a query string > and then data about (usually ~300) books will be retrieved via z39 and > marc protocol/format, parsed and saved into local database. > > Of course this will take a time (2? 5? 20? seconds) and I decided > not to show the result immediately, > but show the same form with possibility to enter the next query + > there is a list of pending queries (and their status - via ajax testing > every 5 seconds) > > So my idea was to provide a return from the controller fast and before > the return to start a new thread to retrieve/parse/save/commit data. > > From this discussion I understand that open new thread isn't best idea. > I think it could be still possible, because if my new thread could be > killed 60s later from the web server together with the original thread - > such possibility is not fatal problem for me here. > > However when (as I read here) this would be a little wild technology, > and because other technologies mentioned here: > https://en.wikipedia.org/wiki/Comet_(programming) -paragraph > Aternatives, are too difficult for me, > and because I don't want use a sched
[web2py] Re: A basic problem about threading and time consuming function...
NP: as everything it's not the silver bullet but with the redis incarnation I'm sure you can achieve less than 3 second (if you tune heartbeat even less than 1 second) from when the task gets queued to when it gets processed. On Tuesday, May 3, 2016 at 12:32:13 PM UTC+2, Mirek Zvolský wrote: > > Hi, Niphlod. > > After I have read something about scheduler, > I am definitively sorry for my previous notes > and I choose web2py scheduler of course. > > It will be my first use of it (with much older ~3 years web2py app I have > used cron only), > so it will take some time to learn with scheduler. But it is sure worth to > redesign it so. > > Thanks you are patient with me. > Mirek > > > > > Dne pondělí 2. května 2016 20:35:05 UTC+2 Mirek Zvolský napsal(a): >> >> You are right. >> At this time it works for me via ajax well and I will look carefully for >> problems. >> If so, I will move to scheduler. >> >> I see this is exactly what Massimo(?) writes at the bottom of Ajax >> chapter of the book. >> >> PS: about times: >> At notebook with mobile connection it takes 20-40s. So it could be danger. >> At cloud server with SSD it takes 2-10s. But this will be my case. And I >> feel better when the user can have typical response in 3s instead in 8s. >> >> >> >> >> >> Dne neděle 1. května 2016 22:10:31 UTC+2 Niphlod napsal(a): >>> >>> the statement "I don't need to use the scheduler, because I want to >>> start it as soon as possible" is flaky at best. If your "fetching" varies >>> from 2 to 20 seconds and COULD extend further to 60 seconds, waiting a few >>> seconds for the scheduler to start the process is uhm... debatable. >>> Of course relying to ajax if your "feching" can be killed in the process >>> is the only other way. >>> >>> On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: Thanks for info and tips, 6 years later. What I try to do is a form with single input, where user gives a query string and then data about (usually ~300) books will be retrieved via z39 and marc protocol/format, parsed and saved into local database. Of course this will take a time (2? 5? 20? seconds) and I decided not to show the result immediately, but show the same form with possibility to enter the next query + there is a list of pending queries (and their status - via ajax testing every 5 seconds) So my idea was to provide a return from the controller fast and before the return to start a new thread to retrieve/parse/save/commit data. From this discussion I understand that open new thread isn't best idea. I think it could be still possible, because if my new thread could be killed 60s later from the web server together with the original thread - such possibility is not fatal problem for me here. However when (as I read here) this would be a little wild technology, and because other technologies mentioned here: https://en.wikipedia.org/wiki/Comet_(programming) -paragraph Aternatives, are too difficult for me, and because I don't want use a scheduler, because I need to start as soon as possible, I will solve it so, that I will make 2 http accesses from my page: one with submit (will validate/save the query to database) and one with ajax/javascript (onSubmit from the old page or better: onPageLoaded from the next page where I give the query in .html DOM as some hidden value), which will start the z39 protocol/retrieve/parse/save data. This will be much better, because web2py in the ajax call will prepare the db variable with proper db model for me (which otherwise I must handle myselves in the separate thread). Callback from this ajax call should/could be some dummy javascript function, because it is not sure, and not important, if the page still exists when the server job will finish. So, if somebody is interesting and will read this very old thread, maybe this can give him some idea for time consumming actions. And maybe somebody will add other important hints or comments (thanks in advance). Dne středa 26. května 2010 0:33:02 UTC+2 Giuseppe Luca Scrofani napsal(a): > > Hi all, as promised I'm here to prove you are patient and nice :) > I' have to make this little app where there is a function that read > the html content of several pages of another website (like a spider) > and if a specified keyword is found the app refresh a page where there > is the growing list of "match". > Now, the spider part is already coded, is called search(), it uses > twill to log in the target site, read the html of a list of pages, > perform some searching procedures and keep adding the result to a > list. I integrated this in a default.py controller and make a call in > def index(): >>>
[web2py] Re: A basic problem about threading and time consuming function...
Hi, Niphlod. After I have read something about scheduler, I am definitively sorry for my previous notes and I choose web2py scheduler of course. It will be my first use of it (with much older ~3 years web2py app I have used cron only), so it will take some time to learn with scheduler. But it is sure worth to redesign it so. Thanks you are patient with me. Mirek Dne pondělí 2. května 2016 20:35:05 UTC+2 Mirek Zvolský napsal(a): > > You are right. > At this time it works for me via ajax well and I will look carefully for > problems. > If so, I will move to scheduler. > > I see this is exactly what Massimo(?) writes at the bottom of Ajax chapter > of the book. > > PS: about times: > At notebook with mobile connection it takes 20-40s. So it could be danger. > At cloud server with SSD it takes 2-10s. But this will be my case. And I > feel better when the user can have typical response in 3s instead in 8s. > > > > > > Dne neděle 1. května 2016 22:10:31 UTC+2 Niphlod napsal(a): >> >> the statement "I don't need to use the scheduler, because I want to start >> it as soon as possible" is flaky at best. If your "fetching" varies from 2 >> to 20 seconds and COULD extend further to 60 seconds, waiting a few seconds >> for the scheduler to start the process is uhm... debatable. >> Of course relying to ajax if your "feching" can be killed in the process >> is the only other way. >> >> On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: >>> >>> Thanks for info and tips, 6 years later. >>> >>> What I try to do >>> is a form with single input, where user gives a query string >>> and then data about (usually ~300) books will be retrieved via z39 and >>> marc protocol/format, parsed and saved into local database. >>> >>> Of course this will take a time (2? 5? 20? seconds) and I decided >>> not to show the result immediately, >>> but show the same form with possibility to enter the next query + there >>> is a list of pending queries (and their status - via ajax testing every 5 >>> seconds) >>> >>> So my idea was to provide a return from the controller fast and before >>> the return to start a new thread to retrieve/parse/save/commit data. >>> >>> From this discussion I understand that open new thread isn't best idea. >>> I think it could be still possible, because if my new thread could be >>> killed 60s later from the web server together with the original thread - >>> such possibility is not fatal problem for me here. >>> >>> However when (as I read here) this would be a little wild technology, >>> and because other technologies mentioned here: >>> https://en.wikipedia.org/wiki/Comet_(programming) -paragraph >>> Aternatives, are too difficult for me, >>> and because I don't want use a scheduler, because I need to start as >>> soon as possible, >>> >>> I will solve it so, >>> that I will make 2 http accesses from my page: one with submit (will >>> validate/save the query to database) and one with ajax/javascript (onSubmit >>> from the old page or better: onPageLoaded from the next page where I give >>> the query in .html DOM as some hidden value), which will start the z39 >>> protocol/retrieve/parse/save data. >>> This will be much better, because web2py in the ajax call will prepare >>> the db variable with proper db model for me (which otherwise I must handle >>> myselves in the separate thread). >>> Callback from this ajax call should/could be some dummy javascript >>> function, because it is not sure, and not important, if the page still >>> exists when the server job will finish. >>> >>> So, if somebody is interesting and will read this very old thread, maybe >>> this can give him some idea for time consumming actions. >>> And maybe somebody will add other important hints or comments (thanks in >>> advance). >>> >>> >>> >>> >>> >>> >>> Dne středa 26. května 2010 0:33:02 UTC+2 Giuseppe Luca Scrofani >>> napsal(a): Hi all, as promised I'm here to prove you are patient and nice :) I' have to make this little app where there is a function that read the html content of several pages of another website (like a spider) and if a specified keyword is found the app refresh a page where there is the growing list of "match". Now, the spider part is already coded, is called search(), it uses twill to log in the target site, read the html of a list of pages, perform some searching procedures and keep adding the result to a list. I integrated this in a default.py controller and make a call in def index(): This make the index.html page loading for a long time, because now it have to finish to scan all pages before return all results. What I want to achieve is to automatically refresh index every 2 second to keep in touch with what is going on, seeing the list of match growing in "realtime". Even better, if I can use some sort of ajax magic to not refresh the entire page... but this is not vital, a >
[web2py] Re: A basic problem about threading and time consuming function...
You are right. At this time it works for me via ajax well and I will look carefully for problems. If so, I will move to scheduler. I see this is exactly what Massimo(?) writes at the bottom of Ajax chapter of the book. PS: about times: At notebook with mobile connection it takes 20-40s. So it could be danger. At cloud server with SSD it takes 2-10s. But this will be my case. And I feel better when the user can have typical response in 3s instead in 8s. Dne neděle 1. května 2016 22:10:31 UTC+2 Niphlod napsal(a): > > the statement "I don't need to use the scheduler, because I want to start > it as soon as possible" is flaky at best. If your "fetching" varies from 2 > to 20 seconds and COULD extend further to 60 seconds, waiting a few seconds > for the scheduler to start the process is uhm... debatable. > Of course relying to ajax if your "feching" can be killed in the process > is the only other way. > > On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: >> >> Thanks for info and tips, 6 years later. >> >> What I try to do >> is a form with single input, where user gives a query string >> and then data about (usually ~300) books will be retrieved via z39 and >> marc protocol/format, parsed and saved into local database. >> >> Of course this will take a time (2? 5? 20? seconds) and I decided >> not to show the result immediately, >> but show the same form with possibility to enter the next query + there >> is a list of pending queries (and their status - via ajax testing every 5 >> seconds) >> >> So my idea was to provide a return from the controller fast and before >> the return to start a new thread to retrieve/parse/save/commit data. >> >> From this discussion I understand that open new thread isn't best idea. >> I think it could be still possible, because if my new thread could be >> killed 60s later from the web server together with the original thread - >> such possibility is not fatal problem for me here. >> >> However when (as I read here) this would be a little wild technology, >> and because other technologies mentioned here: >> https://en.wikipedia.org/wiki/Comet_(programming) -paragraph >> Aternatives, are too difficult for me, >> and because I don't want use a scheduler, because I need to start as soon >> as possible, >> >> I will solve it so, >> that I will make 2 http accesses from my page: one with submit (will >> validate/save the query to database) and one with ajax/javascript (onSubmit >> from the old page or better: onPageLoaded from the next page where I give >> the query in .html DOM as some hidden value), which will start the z39 >> protocol/retrieve/parse/save data. >> This will be much better, because web2py in the ajax call will prepare >> the db variable with proper db model for me (which otherwise I must handle >> myselves in the separate thread). >> Callback from this ajax call should/could be some dummy javascript >> function, because it is not sure, and not important, if the page still >> exists when the server job will finish. >> >> So, if somebody is interesting and will read this very old thread, maybe >> this can give him some idea for time consumming actions. >> And maybe somebody will add other important hints or comments (thanks in >> advance). >> >> >> >> >> >> >> Dne středa 26. května 2010 0:33:02 UTC+2 Giuseppe Luca Scrofani napsal(a): >>> >>> Hi all, as promised I'm here to prove you are patient and nice :) >>> I' have to make this little app where there is a function that read >>> the html content of several pages of another website (like a spider) >>> and if a specified keyword is found the app refresh a page where there >>> is the growing list of "match". >>> Now, the spider part is already coded, is called search(), it uses >>> twill to log in the target site, read the html of a list of pages, >>> perform some searching procedures and keep adding the result to a >>> list. I integrated this in a default.py controller and make a call in >>> def index(): >>> This make the index.html page loading for a long time, because now it >>> have to finish to scan all pages before return all results. >>> What I want to achieve is to automatically refresh index every 2 >>> second to keep in touch with what is going on, seeing the list of >>> match growing in "realtime". Even better, if I can use some sort of >>> ajax magic to not refresh the entire page... but this is not vital, a >>> simple page refresh would be sufficient. >>> Question is: I have to use threading to solve this problem? >>> Alternative solutions? >>> I have to made the list of match a global to read it from another >>> function? It would be simpler if I made it write a text file, adding a >>> line for every match and reading it from the index controller? If I >>> have to use thread it will run on GAE? >>> >>> Sorry for the long text and for my bad english :) >>> >>> gls >>> >>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http
[web2py] Re: A basic problem about threading and time consuming function...
the statement "I don't need to use the scheduler, because I want to start it as soon as possible" is flaky at best. If your "fetching" varies from 2 to 20 seconds and COULD extend further to 60 seconds, waiting a few seconds for the scheduler to start the process is uhm... debatable. Of course relying to ajax if your "feching" can be killed in the process is the only other way. On Sunday, May 1, 2016 at 8:09:23 PM UTC+2, Mirek Zvolský wrote: > > Thanks for info and tips, 6 years later. > > What I try to do > is a form with single input, where user gives a query string > and then data about (usually ~300) books will be retrieved via z39 and > marc protocol/format, parsed and saved into local database. > > Of course this will take a time (2? 5? 20? seconds) and I decided > not to show the result immediately, > but show the same form with possibility to enter the next query + there is > a list of pending queries (and their status - via ajax testing every 5 > seconds) > > So my idea was to provide a return from the controller fast and before the > return to start a new thread to retrieve/parse/save/commit data. > > From this discussion I understand that open new thread isn't best idea. > I think it could be still possible, because if my new thread could be > killed 60s later from the web server together with the original thread - > such possibility is not fatal problem for me here. > > However when (as I read here) this would be a little wild technology, > and because other technologies mentioned here: > https://en.wikipedia.org/wiki/Comet_(programming) -paragraph Aternatives, > are too difficult for me, > and because I don't want use a scheduler, because I need to start as soon > as possible, > > I will solve it so, > that I will make 2 http accesses from my page: one with submit (will > validate/save the query to database) and one with ajax/javascript (onSubmit > from the old page or better: onPageLoaded from the next page where I give > the query in .html DOM as some hidden value), which will start the z39 > protocol/retrieve/parse/save data. > This will be much better, because web2py in the ajax call will prepare the > db variable with proper db model for me (which otherwise I must handle > myselves in the separate thread). > Callback from this ajax call should/could be some dummy javascript > function, because it is not sure, and not important, if the page still > exists when the server job will finish. > > So, if somebody is interesting and will read this very old thread, maybe > this can give him some idea for time consumming actions. > And maybe somebody will add other important hints or comments (thanks in > advance). > > > > > > > Dne středa 26. května 2010 0:33:02 UTC+2 Giuseppe Luca Scrofani napsal(a): >> >> Hi all, as promised I'm here to prove you are patient and nice :) >> I' have to make this little app where there is a function that read >> the html content of several pages of another website (like a spider) >> and if a specified keyword is found the app refresh a page where there >> is the growing list of "match". >> Now, the spider part is already coded, is called search(), it uses >> twill to log in the target site, read the html of a list of pages, >> perform some searching procedures and keep adding the result to a >> list. I integrated this in a default.py controller and make a call in >> def index(): >> This make the index.html page loading for a long time, because now it >> have to finish to scan all pages before return all results. >> What I want to achieve is to automatically refresh index every 2 >> second to keep in touch with what is going on, seeing the list of >> match growing in "realtime". Even better, if I can use some sort of >> ajax magic to not refresh the entire page... but this is not vital, a >> simple page refresh would be sufficient. >> Question is: I have to use threading to solve this problem? >> Alternative solutions? >> I have to made the list of match a global to read it from another >> function? It would be simpler if I made it write a text file, adding a >> line for every match and reading it from the index controller? If I >> have to use thread it will run on GAE? >> >> Sorry for the long text and for my bad english :) >> >> gls >> >> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[web2py] Re: A basic problem about threading and time consuming function...
Thanks for info and tips, 6 years later. What I try to do is a form with single input, where user gives a query string and then data about (usually ~300) books will be retrieved via z39 and marc protocol/format, parsed and saved into local database. Of course this will take a time (2? 5? 20? seconds) and I decided not to show the result immediately, but show the same form with possibility to enter the next query + there is a list of pending queries (and their status - via ajax testing every 5 seconds) So my idea was to provide a return from the controller fast and before the return to start a new thread to retrieve/parse/save/commit data. >From this discussion I understand that open new thread isn't best idea. I think it could be still possible, because if my new thread could be killed 60s later from the web server together with the original thread - such possibility is not fatal problem for me here. However when (as I read here) this would be a little wild technology, and because other technologies mentioned here: https://en.wikipedia.org/wiki/Comet_(programming) -paragraph Aternatives, are too difficult for me, and because I don't want use a scheduler, because I need to start as soon as possible, I will solve it so, that I will make 2 http accesses from my page: one with submit (will validate/save the query to database) and one with ajax/javascript (onSubmit from the old page or better: onPageLoaded from the next page where I give the query in .html DOM as some hidden value), which will start the z39 protocol/retrieve/parse/save data. This will be much better, because web2py in the ajax call will prepare the db variable with proper db model for me (which otherwise I must handle myselves in the separate thread). Callback from this ajax call should/could be some dummy javascript function, because it is not sure, and not important, if the page still exists when the server job will finish. So, if somebody is interesting and will read this very old thread, maybe this can give him some idea for time consumming actions. And maybe somebody will add other important hints or comments (thanks in advance). Dne středa 26. května 2010 0:33:02 UTC+2 Giuseppe Luca Scrofani napsal(a): > > Hi all, as promised I'm here to prove you are patient and nice :) > I' have to make this little app where there is a function that read > the html content of several pages of another website (like a spider) > and if a specified keyword is found the app refresh a page where there > is the growing list of "match". > Now, the spider part is already coded, is called search(), it uses > twill to log in the target site, read the html of a list of pages, > perform some searching procedures and keep adding the result to a > list. I integrated this in a default.py controller and make a call in > def index(): > This make the index.html page loading for a long time, because now it > have to finish to scan all pages before return all results. > What I want to achieve is to automatically refresh index every 2 > second to keep in touch with what is going on, seeing the list of > match growing in "realtime". Even better, if I can use some sort of > ajax magic to not refresh the entire page... but this is not vital, a > simple page refresh would be sufficient. > Question is: I have to use threading to solve this problem? > Alternative solutions? > I have to made the list of match a global to read it from another > function? It would be simpler if I made it write a text file, adding a > line for every match and reading it from the index controller? If I > have to use thread it will run on GAE? > > Sorry for the long text and for my bad english :) > > gls > > -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [web2py] Re: A basic problem about threading and time consuming function...
I started looking at this a bit, you can find the specs for the Comet protocol, such as it is at http://svn.cometd.com/trunk/bayeux/bayeux.html It's built on top of JSON but isn't quite JSON-RPC. More of a publish/subscribe model. The latest version of 'cometd' just released a beta release available at http://download.cometd.org/ and includes a JavaScript library for both dojo and jquery (Well it's written in/for dojo, but has a jquery style interface as well.) I started poking at it a bit, but I haven't ever done any Jquery so it will probably be slow. My plan is to require basic auth, and then put the client ID into the session and to put any data on subscribed channels into that connection, then just use long-polling to keep it open. Since there's no real way for a web2py app to be notified of internal state changes, I'm not sure long term how I would handle actually looking for anything to send out over the long poll. Though I've had some thoughts of writing a scheduler for web2py with granularity of a second or so. On Tue, May 25, 2010 at 7:24 PM, Allard wrote: > Comet is a nice way to get this done but I wonder how to implement > comet efficiently in web2py. Massimo, does web2py use a threadpool > under the hood? For comet you would then quickly run out of threads. > If you'd try to do this with a thread per connection things would get > out of hand pretty quickly so the best way is doing the work > asynchronously like Orbited. Alternatives would be using one of the > contemporary Python asynchronous libraries. These libraries provide > monkey patching of synchronous calls like your url fetching. Some > suggestions: > > Gevent: now with support of Postgress, probably the fastest out there > Eventlet: used at Lindenlab / Second Life > Concurrence: with handy async mysql interface > Tornado: full async webserver in Python > > Massimo: what do you think of an asynchronous model for web2py? It'd > be great to to have asynchronous capabilities. I am writing an app > that will require quite a bit of client initiated background > processing (sending emails, resizing images) which I would rather hand > off to a green thread and not block one the web2py threads. Curious > about your thoughts. > > BTW - my first post here. Started to use for web2py for a community > site and enjoy working in it a lot! Great work. > > On May 25, 9:39 pm, Candid wrote: >> Well, actually there is a way for the server to trigger an action in >> the browser. It's called comet. Of course under the hood it's >> implemented on top of http, so it's browser who initiates request, but >> from the developer perspective it looks like there is dual channel >> connection between the browser and the server, and they both can send >> messages to each other asynchronously. There are several >> implementation of comet technology. I've used Orbited (http:// >> orbited.org/) and it worked quite well for me. >> >> On May 25, 9:00 pm, mdipierro wrote: >> >> > I would use a background process that does the work and adds the items >> > to a database table. The index function would periodically refresh or >> > pull an updated list via ajax from the database table. there is no way >> > for te server to trigger an action in the browser unless 1) the >> > browser initiates it or 2) the client code embeds an ajax http server. >> > I would stay away from 1 and 2 and >> > use reload of ajax. >> >> > On May 25, 5:33 pm, Giuseppe Luca Scrofani >> > wrote: >> >> > > Hi all, as promised I'm here to prove you are patient and nice :) >> > > I' have to make this little app where there is a function that read >> > > the html content of several pages of another website (like a spider) >> > > and if a specified keyword is found the app refresh a page where there >> > > is the growing list of "match". >> > > Now, the spider part is already coded, is called search(), it uses >> > > twill to log in the target site, read the html of a list of pages, >> > > perform some searching procedures and keep adding the result to a >> > > list. I integrated this in a default.py controller and make a call in >> > > def index(): >> > > This make the index.html page loading for a long time, because now it >> > > have to finish to scan all pages before return all results. >> > > What I want to achieve is to automatically refresh index every 2 >> > > second to keep in touch with what is going on, seeing the list of >> > > match growing in "realtime". Even better, if I can use some sort of >> > > ajax magic to not refresh the entire page... but this is not vital, a >> > > simple page refresh would be sufficient. >> > > Question is: I have to use threading to solve this problem? >> > > Alternative solutions? >> > > I have to made the list of match a global to read it from another >> > > function? It would be simpler if I made it write a text file, adding a >> > > line for every match and reading it from the index controller? If I >> > > have to use thread it will run on GAE? >>
Re: [web2py] Re: A basic problem about threading and time consuming function...
Thanks all for answering friends! I've extracted some good info from this discussion, and the solution proposed by Massimo work well :)
[web2py] Re: A basic problem about threading and time consuming function...
I won't have time to work out a async proof of concept at this time. I hope to get this after some more real world profiling with my web2py app though. To give you an idea of how an async web framework could feel as natural in programming style as web2py (eg. no call backs all over the place), have a look at Concurrence documentation if you're interested: http://opensource.hyves.org/concurrence/index.html To implement async for web2py is probably for the most part straightforward (monkey patching all the IO). The trouble will be with external libraries that block and can't be monkey patched. For example db drivers. Maybe those blocking calls are best dealt with in a thread pool and queue. The idea of Comet is to keep the connection open to the client and flow data as it becomes available: http://en.wikipedia.org/wiki/Comet_%28programming%29 It saves the overhead of a client polling at intervals and establishing the connection each time. In a thread per connection model you would need to keep a thread available per client. A thread per client can get expensive quickly and does not scale nicely. After a few hundred connections most servers slow down dramatically because of thread context switching. See also: http://www.kegel.com/c10k.html For most web apps a thread per connection (from a threadpool) won't be a problem but for for things like Ajax email applications or chat / IM it does get troublesome. On May 25, 10:59 pm, mdipierro wrote: > On May 25, 9:24 pm, Allard wrote: > > > Comet is a nice way to get this done but I wonder how to implement > > comet efficiently in web2py. > > I have never used comet but I do not see any major problem > > > Massimo, does web2py use a threadpool > > under the hood? For comet you would then quickly run out of threads. > > The web server creates a thread pool. for stand alone web2py that > would be Rocket. > You do not run out of them any more than any other web app. > > > > > If you'd try to do this with a thread per connection things would get > > out of hand pretty quickly so the best way is doing the work > > asynchronously like Orbited. Alternatives would be using one of the > > contemporary Python asynchronous libraries. These libraries provide > > monkey patching of synchronous calls like your url fetching. Some > > suggestions: > > > Gevent: now with support of Postgress, probably the fastest out there > > Eventlet: used at Lindenlab / Second Life > > Concurrence: with handy async mysql interface > > Tornado: full async webserver in Python > > > Massimo: what do you think of an asynchronous model for web2py? It'd > > be great to to have asynchronous capabilities. I am writing an app > > that will require quite a bit of client initiated background > > processing (sending emails, resizing images) which I would rather hand > > off to a green thread and not block one the web2py threads. Curious > > about your thoughts. > > I do not think we can use async IO with web2py. async IO as far as I > understand would require a different programming style. > Anyway, if you have a working proof of concept I would like to see it. > > Massimo > > > > > BTW - my first post here. Started to use for web2py for a community > > site and enjoy working in it a lot! Great work. > > > On May 25, 9:39 pm, Candid wrote: > > > > Well, actually there is a way for the server to trigger an action in > > > the browser. It's called comet. Of course under the hood it's > > > implemented on top of http, so it's browser who initiates request, but > > > from the developer perspective it looks like there is dual channel > > > connection between the browser and the server, and they both can send > > > messages to each other asynchronously. There are several > > > implementation of comet technology. I've used Orbited (http:// > > > orbited.org/) and it worked quite well for me. > > > > On May 25, 9:00 pm, mdipierro wrote: > > > > > I would use a background process that does the work and adds the items > > > > to a database table. The index function would periodically refresh or > > > > pull an updated list via ajax from the database table. there is no way > > > > for te server to trigger an action in the browser unless 1) the > > > > browser initiates it or 2) the client code embeds an ajax http server. > > > > I would stay away from 1 and 2 and > > > > use reload of ajax. > > > > > On May 25, 5:33 pm, Giuseppe Luca Scrofani > > > > wrote: > > > > > > Hi all, as promised I'm here to prove you are patient and nice :) > > > > > I' have to make this little app where there is a function that read > > > > > the html content of several pages of another website (like a spider) > > > > > and if a specified keyword is found the app refresh a page where there > > > > > is the growing list of "match". > > > > > Now, the spider part is already coded, is called search(), it uses > > > > > twill to log in the target site, read the html of a list of pages, > > > > > perform some searching procedures and keep a
[web2py] Re: A basic problem about threading and time consuming function...
On May 25, 9:24 pm, Allard wrote: > Comet is a nice way to get this done but I wonder how to implement > comet efficiently in web2py. I have never used comet but I do not see any major problem > Massimo, does web2py use a threadpool > under the hood? For comet you would then quickly run out of threads. The web server creates a thread pool. for stand alone web2py that would be Rocket. You do not run out of them any more than any other web app. > If you'd try to do this with a thread per connection things would get > out of hand pretty quickly so the best way is doing the work > asynchronously like Orbited. Alternatives would be using one of the > contemporary Python asynchronous libraries. These libraries provide > monkey patching of synchronous calls like your url fetching. Some > suggestions: > > Gevent: now with support of Postgress, probably the fastest out there > Eventlet: used at Lindenlab / Second Life > Concurrence: with handy async mysql interface > Tornado: full async webserver in Python > > Massimo: what do you think of an asynchronous model for web2py? It'd > be great to to have asynchronous capabilities. I am writing an app > that will require quite a bit of client initiated background > processing (sending emails, resizing images) which I would rather hand > off to a green thread and not block one the web2py threads. Curious > about your thoughts. I do not think we can use async IO with web2py. async IO as far as I understand would require a different programming style. Anyway, if you have a working proof of concept I would like to see it. Massimo > > BTW - my first post here. Started to use for web2py for a community > site and enjoy working in it a lot! Great work. > > On May 25, 9:39 pm, Candid wrote: > > > Well, actually there is a way for the server to trigger an action in > > the browser. It's called comet. Of course under the hood it's > > implemented on top of http, so it's browser who initiates request, but > > from the developer perspective it looks like there is dual channel > > connection between the browser and the server, and they both can send > > messages to each other asynchronously. There are several > > implementation of comet technology. I've used Orbited (http:// > > orbited.org/) and it worked quite well for me. > > > On May 25, 9:00 pm, mdipierro wrote: > > > > I would use a background process that does the work and adds the items > > > to a database table. The index function would periodically refresh or > > > pull an updated list via ajax from the database table. there is no way > > > for te server to trigger an action in the browser unless 1) the > > > browser initiates it or 2) the client code embeds an ajax http server. > > > I would stay away from 1 and 2 and > > > use reload of ajax. > > > > On May 25, 5:33 pm, Giuseppe Luca Scrofani > > > wrote: > > > > > Hi all, as promised I'm here to prove you are patient and nice :) > > > > I' have to make this little app where there is a function that read > > > > the html content of several pages of another website (like a spider) > > > > and if a specified keyword is found the app refresh a page where there > > > > is the growing list of "match". > > > > Now, the spider part is already coded, is called search(), it uses > > > > twill to log in the target site, read the html of a list of pages, > > > > perform some searching procedures and keep adding the result to a > > > > list. I integrated this in a default.py controller and make a call in > > > > def index(): > > > > This make the index.html page loading for a long time, because now it > > > > have to finish to scan all pages before return all results. > > > > What I want to achieve is to automatically refresh index every 2 > > > > second to keep in touch with what is going on, seeing the list of > > > > match growing in "realtime". Even better, if I can use some sort of > > > > ajax magic to not refresh the entire page... but this is not vital, a > > > > simple page refresh would be sufficient. > > > > Question is: I have to use threading to solve this problem? > > > > Alternative solutions? > > > > I have to made the list of match a global to read it from another > > > > function? It would be simpler if I made it write a text file, adding a > > > > line for every match and reading it from the index controller? If I > > > > have to use thread it will run on GAE? > > > > > Sorry for the long text and for my bad english :) > > > > > gls
[web2py] Re: A basic problem about threading and time consuming function...
Comet is a nice way to get this done but I wonder how to implement comet efficiently in web2py. Massimo, does web2py use a threadpool under the hood? For comet you would then quickly run out of threads. If you'd try to do this with a thread per connection things would get out of hand pretty quickly so the best way is doing the work asynchronously like Orbited. Alternatives would be using one of the contemporary Python asynchronous libraries. These libraries provide monkey patching of synchronous calls like your url fetching. Some suggestions: Gevent: now with support of Postgress, probably the fastest out there Eventlet: used at Lindenlab / Second Life Concurrence: with handy async mysql interface Tornado: full async webserver in Python Massimo: what do you think of an asynchronous model for web2py? It'd be great to to have asynchronous capabilities. I am writing an app that will require quite a bit of client initiated background processing (sending emails, resizing images) which I would rather hand off to a green thread and not block one the web2py threads. Curious about your thoughts. BTW - my first post here. Started to use for web2py for a community site and enjoy working in it a lot! Great work. On May 25, 9:39 pm, Candid wrote: > Well, actually there is a way for the server to trigger an action in > the browser. It's called comet. Of course under the hood it's > implemented on top of http, so it's browser who initiates request, but > from the developer perspective it looks like there is dual channel > connection between the browser and the server, and they both can send > messages to each other asynchronously. There are several > implementation of comet technology. I've used Orbited (http:// > orbited.org/) and it worked quite well for me. > > On May 25, 9:00 pm, mdipierro wrote: > > > I would use a background process that does the work and adds the items > > to a database table. The index function would periodically refresh or > > pull an updated list via ajax from the database table. there is no way > > for te server to trigger an action in the browser unless 1) the > > browser initiates it or 2) the client code embeds an ajax http server. > > I would stay away from 1 and 2 and > > use reload of ajax. > > > On May 25, 5:33 pm, Giuseppe Luca Scrofani > > wrote: > > > > Hi all, as promised I'm here to prove you are patient and nice :) > > > I' have to make this little app where there is a function that read > > > the html content of several pages of another website (like a spider) > > > and if a specified keyword is found the app refresh a page where there > > > is the growing list of "match". > > > Now, the spider part is already coded, is called search(), it uses > > > twill to log in the target site, read the html of a list of pages, > > > perform some searching procedures and keep adding the result to a > > > list. I integrated this in a default.py controller and make a call in > > > def index(): > > > This make the index.html page loading for a long time, because now it > > > have to finish to scan all pages before return all results. > > > What I want to achieve is to automatically refresh index every 2 > > > second to keep in touch with what is going on, seeing the list of > > > match growing in "realtime". Even better, if I can use some sort of > > > ajax magic to not refresh the entire page... but this is not vital, a > > > simple page refresh would be sufficient. > > > Question is: I have to use threading to solve this problem? > > > Alternative solutions? > > > I have to made the list of match a global to read it from another > > > function? It would be simpler if I made it write a text file, adding a > > > line for every match and reading it from the index controller? If I > > > have to use thread it will run on GAE? > > > > Sorry for the long text and for my bad english :) > > > > gls
[web2py] Re: A basic problem about threading and time consuming function...
It seems like Comet would be hard to implement in web2py. Does web2py use a threadpool internally? If so, I can see you run out of threads pretty quickly. Ideally you would like to solve these kind of problems with an asynchronous model (think Gevent, Eventlet, Concurrence, Toranado). I am working on a project which requires a lot of slow processing (image resizing, sending emails) based on client initiated calls. Massimo, have you considered an asynchronous model within web2py? Curious about your thoughts on it. I would much rather handle the long running tasks in a green thread then to block a complete thread. My first post here and just started to work with web2py on a social site. Great work Massimo! Batteries included but still light. On May 25, 9:39 pm, Candid wrote: > Well, actually there is a way for the server to trigger an action in > the browser. It's called comet. Of course under the hood it's > implemented on top of http, so it's browser who initiates request, but > from the developer perspective it looks like there is dual channel > connection between the browser and the server, and they both can send > messages to each other asynchronously. There are several > implementation of comet technology. I've used Orbited (http:// > orbited.org/) and it worked quite well for me. > > On May 25, 9:00 pm, mdipierro wrote: > > > I would use a background process that does the work and adds the items > > to a database table. The index function would periodically refresh or > > pull an updated list via ajax from the database table. there is no way > > for te server to trigger an action in the browser unless 1) the > > browser initiates it or 2) the client code embeds an ajax http server. > > I would stay away from 1 and 2 and > > use reload of ajax. > > > On May 25, 5:33 pm, Giuseppe Luca Scrofani > > wrote: > > > > Hi all, as promised I'm here to prove you are patient and nice :) > > > I' have to make this little app where there is a function that read > > > the html content of several pages of another website (like a spider) > > > and if a specified keyword is found the app refresh a page where there > > > is the growing list of "match". > > > Now, the spider part is already coded, is called search(), it uses > > > twill to log in the target site, read the html of a list of pages, > > > perform some searching procedures and keep adding the result to a > > > list. I integrated this in a default.py controller and make a call in > > > def index(): > > > This make the index.html page loading for a long time, because now it > > > have to finish to scan all pages before return all results. > > > What I want to achieve is to automatically refresh index every 2 > > > second to keep in touch with what is going on, seeing the list of > > > match growing in "realtime". Even better, if I can use some sort of > > > ajax magic to not refresh the entire page... but this is not vital, a > > > simple page refresh would be sufficient. > > > Question is: I have to use threading to solve this problem? > > > Alternative solutions? > > > I have to made the list of match a global to read it from another > > > function? It would be simpler if I made it write a text file, adding a > > > line for every match and reading it from the index controller? If I > > > have to use thread it will run on GAE? > > > > Sorry for the long text and for my bad english :) > > > > gls
[web2py] Re: A basic problem about threading and time consuming function...
Well, actually there is a way for the server to trigger an action in the browser. It's called comet. Of course under the hood it's implemented on top of http, so it's browser who initiates request, but from the developer perspective it looks like there is dual channel connection between the browser and the server, and they both can send messages to each other asynchronously. There are several implementation of comet technology. I've used Orbited (http:// orbited.org/) and it worked quite well for me. On May 25, 9:00 pm, mdipierro wrote: > I would use a background process that does the work and adds the items > to a database table. The index function would periodically refresh or > pull an updated list via ajax from the database table. there is no way > for te server to trigger an action in the browser unless 1) the > browser initiates it or 2) the client code embeds an ajax http server. > I would stay away from 1 and 2 and > use reload of ajax. > > On May 25, 5:33 pm, Giuseppe Luca Scrofani > wrote: > > > > > Hi all, as promised I'm here to prove you are patient and nice :) > > I' have to make this little app where there is a function that read > > the html content of several pages of another website (like a spider) > > and if a specified keyword is found the app refresh a page where there > > is the growing list of "match". > > Now, the spider part is already coded, is called search(), it uses > > twill to log in the target site, read the html of a list of pages, > > perform some searching procedures and keep adding the result to a > > list. I integrated this in a default.py controller and make a call in > > def index(): > > This make the index.html page loading for a long time, because now it > > have to finish to scan all pages before return all results. > > What I want to achieve is to automatically refresh index every 2 > > second to keep in touch with what is going on, seeing the list of > > match growing in "realtime". Even better, if I can use some sort of > > ajax magic to not refresh the entire page... but this is not vital, a > > simple page refresh would be sufficient. > > Question is: I have to use threading to solve this problem? > > Alternative solutions? > > I have to made the list of match a global to read it from another > > function? It would be simpler if I made it write a text file, adding a > > line for every match and reading it from the index controller? If I > > have to use thread it will run on GAE? > > > Sorry for the long text and for my bad english :) > > > gls
[web2py] Re: A basic problem about threading and time consuming function...
I would use a background process that does the work and adds the items to a database table. The index function would periodically refresh or pull an updated list via ajax from the database table. there is no way for te server to trigger an action in the browser unless 1) the browser initiates it or 2) the client code embeds an ajax http server. I would stay away from 1 and 2 and use reload of ajax. On May 25, 5:33 pm, Giuseppe Luca Scrofani wrote: > Hi all, as promised I'm here to prove you are patient and nice :) > I' have to make this little app where there is a function that read > the html content of several pages of another website (like a spider) > and if a specified keyword is found the app refresh a page where there > is the growing list of "match". > Now, the spider part is already coded, is called search(), it uses > twill to log in the target site, read the html of a list of pages, > perform some searching procedures and keep adding the result to a > list. I integrated this in a default.py controller and make a call in > def index(): > This make the index.html page loading for a long time, because now it > have to finish to scan all pages before return all results. > What I want to achieve is to automatically refresh index every 2 > second to keep in touch with what is going on, seeing the list of > match growing in "realtime". Even better, if I can use some sort of > ajax magic to not refresh the entire page... but this is not vital, a > simple page refresh would be sufficient. > Question is: I have to use threading to solve this problem? > Alternative solutions? > I have to made the list of match a global to read it from another > function? It would be simpler if I made it write a text file, adding a > line for every match and reading it from the index controller? If I > have to use thread it will run on GAE? > > Sorry for the long text and for my bad english :) > > gls