[web2py:30039] Re: Optimising returning of file content using wsgi.file_wrapper extension.

mdipierro Fri, 04 Sep 2009 05:49:26 -0700

You are correct that the solution you proposed would work. My only
concern is that some existing application do URL(app,controller,
function) instead of URL(r=request,function). That is valid. When they
try to deploy behind WSGI in a subfolder they will find the links
break. I agree with you that they can change the URL(...) arguments
but they are not supposed to. I like your fix but it would be nice if
we could come up with a way that does not break any existing app.


On Sep 4, 5:53 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> The WSGI specification defines an optional extension referred to as
> wsgi.file_wrapper.
>
>  http://www.python.org/dev/peps/pep-0333/#optional-platform-specific-f...
>
> The intent of this is so that WSGI hosting mechanisms can provide a
> better performing way for responding with content from a file within a
> WSGI application. At the moment, web2py doesn't use this feature.
>
> The patches for it are reasonably simple:
>
> *** globals.py.dist     2009-09-04 13:47:34.000000000 +1000
> --- globals.py  2009-09-04 13:44:10.000000000 +1000
> ***************
> *** 168,174 ****
>                   self.headers['Content-Length'] = os.stat(filename)
> [stat.ST_SIZE]
>               except OSError:
>                   pass
> !         self.body = streamer(stream, chunk_size)
>           return self.body
>
>       def download(self, request, db, chunk_size = 10 ** 6):
> --- 168,177 ----
>                   self.headers['Content-Length'] = os.stat(filename)
> [stat.ST_SIZE]
>               except OSError:
>                   pass
> !         if request and request.env.wsgi_file_wrapper:
> !             self.body = request.env.wsgi_file_wrapper(stream,
> chunk_size)
> !         else:
> !             self.body = streamer(stream, chunk_size)
>           return self.body
>
>       def download(self, request, db, chunk_size = 10 ** 6):
>
> *** streamer.py.dist    2009-09-04 14:34:24.000000000 +1000
> --- streamer.py 2009-09-04 20:06:43.000000000 +1000
> ***************
> *** 80,87 ****
>           stream.seek(part[0])
>           headers['Content-Range'] = 'bytes %i-%i/%i' % part
>           headers['Content-Length'] = '%i' % bytes
> !         raise HTTP(206, streamer(stream, chunk_size=chunk_size,
> !                    bytes=bytes), **headers)
>       else:
>           try:
>               stream = open(static_file, 'rb')
> --- 80,91 ----
>           stream.seek(part[0])
>           headers['Content-Range'] = 'bytes %i-%i/%i' % part
>           headers['Content-Length'] = '%i' % bytes
> !         if request and request.env.wsgi_file_wrapper:
> !             raise HTTP(200, request.env.wsgi_file_wrapper(stream,
> chunk_size),
> !                       **headers)
> !         else:
> !             raise HTTP(206, streamer(stream, chunk_size=chunk_size,
> !                        bytes=bytes), **headers)
>       else:
>           try:
>               stream = open(static_file, 'rb')
> ***************
> *** 91,95 ****
>               else:
>                   raise HTTP(404)
>           headers['Content-Length'] = fsize
> !         raise HTTP(200, streamer(stream, chunk_size=chunk_size),
> !                   **headers)
> --- 95,103 ----
>               else:
>                   raise HTTP(404)
>           headers['Content-Length'] = fsize
> !         if request and request.env.wsgi_file_wrapper:
> !             raise HTTP(200, request.env.wsgi_file_wrapper(stream,
> chunk_size),
> !                       **headers)
> !         else:
> !             raise HTTP(200, streamer(stream, chunk_size=chunk_size),
> !                       **headers)
>
> As example of performance gain one can expect on Apache/mod_wsgi if
> the streamer code from web2py is used in a simple WSGI application
> outside of web2py. Ie.,
>
> def streamer(stream, chunk_size=10 ** 6, bytes=None):
>     offset = 0
>     while bytes == None or offset < bytes:
>         if bytes != None and bytes - offset < chunk_size:
>             chunk_size = bytes - offset
>         data = stream.read(chunk_size)
>         length = len(data)
>         if not length:
>             break
>         else:
>             yield data
>         if length < chunk_size:
>             break
>         offset += length
>
> def application(environ, start_response):
>     status = '200 OK'
>     output = 'Hello World!'
>
>     response_headers = [('Content-type', 'text/plain')]
>     start_response(status, response_headers)
>
>     file = open('/usr/share/dict/words', 'rb')
>     return streamer(file)
>
> With Apache/mod_wsgi and using embedded mode, one can get on recent
> MacBook Pro with a 2.4MB file a rate of  about 175 requests/sec
> serialised.
>
> If one uses wsgi.file_wrapper from Apache/mod_wsgi instead. Ie.,
>
> def application(environ, start_response):
>     status = '200 OK'
>     output = 'Hello World!'
>
>     response_headers = [('Content-type', 'text/plain')]
>     start_response(status, response_headers)
>
>     file = open('/usr/share/dict/words', 'rb')
>     return environ['wsgi.file_wrapper'](file)
>
> With Apache/mod_wsgi using embedded mode, and same size file, one can
> get a rate of 250 requests/sec.
>
> Note here that web2py streamer code actually uses quite a large chunk
> size of 1MB which means that having many concurrent requests for large
> file can see memory usage grow a noticeable amount.
>
> Under Apache/mod_wsgi, use of a large chunk size actually appears to
> result in worse performance than a smaller chunk size. For example,
> with a 64KB chunk size, one can get about 230 requests/sec.
>
> Unfortunately the sweet spot probably varies depending on the WSGI
> hosting mechanism being used. One suggestion may be to allow the
> default chunk size to be overridden to allow tuning of this value if
> an application returns file contents a lot.
>
> Do note that the chunk size for wsgi.file_wrapper only comes into play
> if the streamed object isn't an actual file object and so sending
> can't be optimised, or if Windows used. Where optimisation is
> possible, then Apache/mod_wsgi is using either memory mapping or
> sendfile() system call to speed things up.
>
> As to performance, if using daemon mode of Apache/mod_wsgi, the
> benefits of using wsgi.file_wrapper are not as marked. This is because
> of the additional socket hop due to proxying data between Apache
> server child process and the daemon process. You still have this hop
> if using fastcgi or mod_proxy to backend web2py running with internal
> WSGI server, so any improvement is still good though.
>
> For daemon mode, the rate for that file is 100 requests/sec. For
> default web2py streamer and 1MB chunk size it is 80 requests/sec. For
> web2py streamer and 64KB chunk size, comes in better as about 90
> requests/sec.
>
> Now, my tests results above don't involve web2py. You have to remember
> that when using that code from inside of web2py, you take the
> additional hit from the overhead of web2py, its routing and dispatch
> etc. Thus, although performance for returning of file content may be
> improved, that is only one part of the overall time taken up and may
> be a minimal amount given that database access etc will likely still
> predominate.
>
> Anyway, I present all this so simply can be evaluated whether for WSGI
> hosting mechanism you want to allow wsgi.file_wrapper to be used if
> available. The feature would only get used if WSGI hosting mechanism
> provided it. I perhaps would suggest though that if supported, and
> made default, that you allow user to disable use of it. This is
> because some WSGI hosting mechanisms may not implement
> wsgi.file_wrapper properly. The particular area of concern would be
> web2py byte range support. In Apache/mod_wsgi it will pay attention to
> Content-Length in response and only send that amount of data from
> file, but other WSGI hosting mechanisms may not, and will send all
> file content from the seek point to end of file. This could result in
> more data than supposed to being returned for a byte range request.
>
> If overly concerned, you might make ability to use wsgi.file_wrapper
> off by default, but allow user to turn it on. This way if a specific
> user finds it actually helps for their specific application because of
> heavy streaming of files, then they can turn it on.
>
> Even if not interested in wsgi.file_wrapper, I would suggest you have
> another look at the 1MB chunk size you are currently using and see
> whether that is appropriate for all platforms. You might want to make
> the default globally configurable. Certainly on MacOS X Snow Leopard
> under Apache/mod_wsgi, a block size of 64KB performs better than the
> current default of 1MB.
>
> Graham
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:30039] Re: Optimising returning of file content using wsgi.file_wrapper extension.

Reply via email to