Hello everyone,

I've read through the message archive and there seems to be a fairly clear
message: don't using the multiprocessing module within web2py.

However, I'm hoping I might have a use case that's a bit different...

I've got an app that basically does analytics on moderately large datasets.
 I've got a number of controller methods that look like the following:

def my_method():
    # Note: all data of interest has previously been loaded into
'session.data'
    results = []
    d = local_import('analysis')
    results += d.my_1st_analysis_method(session)
    results += d.my_2nd_analysis_method(session, date=date)
    results += d.my_3rd_analysis_method(session)
    results += d.my_4th_analysis_method(session, date=date)
    results += d.my_5th_analysis_method(session, date=date)
    return dict(results=results)

The problem I have is that all of the methods in my 'analysis' module, when
run in sequence as per the above, simply take too long to execute and give
me a browser timeout.  I can mitigate this to some extent by extending the
timeout on my browser, but I need to be able to use an iPad's Safari browser
and it appears to be impossible to increase the browser timeout on the iPad.
 Even if it can be done, that approach seems pretty ugly and I'd rather not
have to do it.  What I really want to do is run all of these analysis
methods *simultaneously*, capturing the results of each analysis_method into
a single variable once they've finished.

All of the methods within the 'analysis' module are designed to run
concurrently - although they reference session variables, I've consciously
avoided updating any session variables within any of these methods.  While
all the data is stored in a database, it's loaded into a session variable
(session.data) before my_method is called; this data never gets changed as
part of the analysis.

Is it reasonable to replace the above code with something like this:

def my_method():
    import multiprocessing
    d = local_import('analysis')

    tasks = [
        ('job': 'd.my_1st_analysis_method', 'params': ['session']),
        ('job': 'd.my_2nd_analysis_method', 'params': ['session',
'date=date']),
        ('job': 'd.my_3rd_analysis_method', 'params': ['session']),
        ('job': 'd.my_4th_analysis_method', 'params': ['session',
'date=date']),
        ('job': 'd.my_5th_analysis_method', 'params': ['session',
'date=date']),
    ]

    task_queue = multiprocessing.Queue()
    for t in tasks:
        task_queue.put(t['job'])

    result_queue = multiprocessing.Queue()

    for t in tasks:
        args = (arg for arg in t['params'])
        worker = multiprocessing.Worker(work_queue, result_queue, args=args)
        worker.start()

    results = []
    while len(results) < len(tasks):
        result = result_queue.get()
        results.append(result)

    return dict(results=results)

Note: I haven't tried anything using the multiprocessing module before, so
if you've got any suggestions as to how to improve the above code, I'd
greatly appreciate it...

Is introducing multiprocessing as I've outlined above a reasonable way to
optimise code in this scenario, or is there something in web2py that makes
this a bad idea?  If it's a bad idea, do you have any suggestions what else
I could try?

Thanks in advance

David Mitchell

Reply via email to