[web2py:30033] Optimising returning of file content using wsgi.file_wrapper extension.

Graham Dumpleton Fri, 04 Sep 2009 03:54:06 -0700

The WSGI specification defines an optional extension referred to as
wsgi.file_wrapper.


  
http://www.python.org/dev/peps/pep-0333/#optional-platform-specific-file-handling

The intent of this is so that WSGI hosting mechanisms can provide a
better performing way for responding with content from a file within a
WSGI application. At the moment, web2py doesn't use this feature.

The patches for it are reasonably simple:

*** globals.py.dist     2009-09-04 13:47:34.000000000 +1000
--- globals.py  2009-09-04 13:44:10.000000000 +1000
***************
*** 168,174 ****
                  self.headers['Content-Length'] = os.stat(filename)
[stat.ST_SIZE]
              except OSError:
                  pass
!         self.body = streamer(stream, chunk_size)
          return self.body

      def download(self, request, db, chunk_size = 10 ** 6):
--- 168,177 ----
                  self.headers['Content-Length'] = os.stat(filename)
[stat.ST_SIZE]
              except OSError:
                  pass
!         if request and request.env.wsgi_file_wrapper:
!             self.body = request.env.wsgi_file_wrapper(stream,
chunk_size)
!         else:
!             self.body = streamer(stream, chunk_size)
          return self.body

      def download(self, request, db, chunk_size = 10 ** 6):


*** streamer.py.dist    2009-09-04 14:34:24.000000000 +1000
--- streamer.py 2009-09-04 20:06:43.000000000 +1000
***************
*** 80,87 ****
          stream.seek(part[0])
          headers['Content-Range'] = 'bytes %i-%i/%i' % part
          headers['Content-Length'] = '%i' % bytes
!         raise HTTP(206, streamer(stream, chunk_size=chunk_size,
!                    bytes=bytes), **headers)
      else:
          try:
              stream = open(static_file, 'rb')
--- 80,91 ----
          stream.seek(part[0])
          headers['Content-Range'] = 'bytes %i-%i/%i' % part
          headers['Content-Length'] = '%i' % bytes
!         if request and request.env.wsgi_file_wrapper:
!             raise HTTP(200, request.env.wsgi_file_wrapper(stream,
chunk_size),
!                       **headers)
!         else:
!             raise HTTP(206, streamer(stream, chunk_size=chunk_size,
!                        bytes=bytes), **headers)
      else:
          try:
              stream = open(static_file, 'rb')
***************
*** 91,95 ****
              else:
                  raise HTTP(404)
          headers['Content-Length'] = fsize
!         raise HTTP(200, streamer(stream, chunk_size=chunk_size),
!                   **headers)
--- 95,103 ----
              else:
                  raise HTTP(404)
          headers['Content-Length'] = fsize
!         if request and request.env.wsgi_file_wrapper:
!             raise HTTP(200, request.env.wsgi_file_wrapper(stream,
chunk_size),
!                       **headers)
!         else:
!             raise HTTP(200, streamer(stream, chunk_size=chunk_size),
!                       **headers)


As example of performance gain one can expect on Apache/mod_wsgi if
the streamer code from web2py is used in a simple WSGI application
outside of web2py. Ie.,

def streamer(stream, chunk_size=10 ** 6, bytes=None):
    offset = 0
    while bytes == None or offset < bytes:
        if bytes != None and bytes - offset < chunk_size:
            chunk_size = bytes - offset
        data = stream.read(chunk_size)
        length = len(data)
        if not length:
            break
        else:
            yield data
        if length < chunk_size:
            break
        offset += length

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return streamer(file)

With Apache/mod_wsgi and using embedded mode, one can get on recent
MacBook Pro with a 2.4MB file a rate of  about 175 requests/sec
serialised.

If one uses wsgi.file_wrapper from Apache/mod_wsgi instead. Ie.,

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

With Apache/mod_wsgi using embedded mode, and same size file, one can
get a rate of 250 requests/sec.

Note here that web2py streamer code actually uses quite a large chunk
size of 1MB which means that having many concurrent requests for large
file can see memory usage grow a noticeable amount.

Under Apache/mod_wsgi, use of a large chunk size actually appears to
result in worse performance than a smaller chunk size. For example,
with a 64KB chunk size, one can get about 230 requests/sec.

Unfortunately the sweet spot probably varies depending on the WSGI
hosting mechanism being used. One suggestion may be to allow the
default chunk size to be overridden to allow tuning of this value if
an application returns file contents a lot.

Do note that the chunk size for wsgi.file_wrapper only comes into play
if the streamed object isn't an actual file object and so sending
can't be optimised, or if Windows used. Where optimisation is
possible, then Apache/mod_wsgi is using either memory mapping or
sendfile() system call to speed things up.

As to performance, if using daemon mode of Apache/mod_wsgi, the
benefits of using wsgi.file_wrapper are not as marked. This is because
of the additional socket hop due to proxying data between Apache
server child process and the daemon process. You still have this hop
if using fastcgi or mod_proxy to backend web2py running with internal
WSGI server, so any improvement is still good though.

For daemon mode, the rate for that file is 100 requests/sec. For
default web2py streamer and 1MB chunk size it is 80 requests/sec. For
web2py streamer and 64KB chunk size, comes in better as about 90
requests/sec.

Now, my tests results above don't involve web2py. You have to remember
that when using that code from inside of web2py, you take the
additional hit from the overhead of web2py, its routing and dispatch
etc. Thus, although performance for returning of file content may be
improved, that is only one part of the overall time taken up and may
be a minimal amount given that database access etc will likely still
predominate.

Anyway, I present all this so simply can be evaluated whether for WSGI
hosting mechanism you want to allow wsgi.file_wrapper to be used if
available. The feature would only get used if WSGI hosting mechanism
provided it. I perhaps would suggest though that if supported, and
made default, that you allow user to disable use of it. This is
because some WSGI hosting mechanisms may not implement
wsgi.file_wrapper properly. The particular area of concern would be
web2py byte range support. In Apache/mod_wsgi it will pay attention to
Content-Length in response and only send that amount of data from
file, but other WSGI hosting mechanisms may not, and will send all
file content from the seek point to end of file. This could result in
more data than supposed to being returned for a byte range request.

If overly concerned, you might make ability to use wsgi.file_wrapper
off by default, but allow user to turn it on. This way if a specific
user finds it actually helps for their specific application because of
heavy streaming of files, then they can turn it on.

Even if not interested in wsgi.file_wrapper, I would suggest you have
another look at the 1MB chunk size you are currently using and see
whether that is appropriate for all platforms. You might want to make
the default globally configurable. Certainly on MacOS X Snow Leopard
under Apache/mod_wsgi, a block size of 64KB performs better than the
current default of 1MB.

Graham
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:30033] Optimising returning of file content using wsgi.file_wrapper extension.

Reply via email to