The WSGI specification defines an optional extension referred to as wsgi.file_wrapper.
http://www.python.org/dev/peps/pep-0333/#optional-platform-specific-file-handling The intent of this is so that WSGI hosting mechanisms can provide a better performing way for responding with content from a file within a WSGI application. At the moment, web2py doesn't use this feature. The patches for it are reasonably simple: *** globals.py.dist 2009-09-04 13:47:34.000000000 +1000 --- globals.py 2009-09-04 13:44:10.000000000 +1000 *************** *** 168,174 **** self.headers['Content-Length'] = os.stat(filename) [stat.ST_SIZE] except OSError: pass ! self.body = streamer(stream, chunk_size) return self.body def download(self, request, db, chunk_size = 10 ** 6): --- 168,177 ---- self.headers['Content-Length'] = os.stat(filename) [stat.ST_SIZE] except OSError: pass ! if request and request.env.wsgi_file_wrapper: ! self.body = request.env.wsgi_file_wrapper(stream, chunk_size) ! else: ! self.body = streamer(stream, chunk_size) return self.body def download(self, request, db, chunk_size = 10 ** 6): *** streamer.py.dist 2009-09-04 14:34:24.000000000 +1000 --- streamer.py 2009-09-04 20:06:43.000000000 +1000 *************** *** 80,87 **** stream.seek(part[0]) headers['Content-Range'] = 'bytes %i-%i/%i' % part headers['Content-Length'] = '%i' % bytes ! raise HTTP(206, streamer(stream, chunk_size=chunk_size, ! bytes=bytes), **headers) else: try: stream = open(static_file, 'rb') --- 80,91 ---- stream.seek(part[0]) headers['Content-Range'] = 'bytes %i-%i/%i' % part headers['Content-Length'] = '%i' % bytes ! if request and request.env.wsgi_file_wrapper: ! raise HTTP(200, request.env.wsgi_file_wrapper(stream, chunk_size), ! **headers) ! else: ! raise HTTP(206, streamer(stream, chunk_size=chunk_size, ! bytes=bytes), **headers) else: try: stream = open(static_file, 'rb') *************** *** 91,95 **** else: raise HTTP(404) headers['Content-Length'] = fsize ! raise HTTP(200, streamer(stream, chunk_size=chunk_size), ! **headers) --- 95,103 ---- else: raise HTTP(404) headers['Content-Length'] = fsize ! if request and request.env.wsgi_file_wrapper: ! raise HTTP(200, request.env.wsgi_file_wrapper(stream, chunk_size), ! **headers) ! else: ! raise HTTP(200, streamer(stream, chunk_size=chunk_size), ! **headers) As example of performance gain one can expect on Apache/mod_wsgi if the streamer code from web2py is used in a simple WSGI application outside of web2py. Ie., def streamer(stream, chunk_size=10 ** 6, bytes=None): offset = 0 while bytes == None or offset < bytes: if bytes != None and bytes - offset < chunk_size: chunk_size = bytes - offset data = stream.read(chunk_size) length = len(data) if not length: break else: yield data if length < chunk_size: break offset += length def application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) file = open('/usr/share/dict/words', 'rb') return streamer(file) With Apache/mod_wsgi and using embedded mode, one can get on recent MacBook Pro with a 2.4MB file a rate of about 175 requests/sec serialised. If one uses wsgi.file_wrapper from Apache/mod_wsgi instead. Ie., def application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) file = open('/usr/share/dict/words', 'rb') return environ['wsgi.file_wrapper'](file) With Apache/mod_wsgi using embedded mode, and same size file, one can get a rate of 250 requests/sec. Note here that web2py streamer code actually uses quite a large chunk size of 1MB which means that having many concurrent requests for large file can see memory usage grow a noticeable amount. Under Apache/mod_wsgi, use of a large chunk size actually appears to result in worse performance than a smaller chunk size. For example, with a 64KB chunk size, one can get about 230 requests/sec. Unfortunately the sweet spot probably varies depending on the WSGI hosting mechanism being used. One suggestion may be to allow the default chunk size to be overridden to allow tuning of this value if an application returns file contents a lot. Do note that the chunk size for wsgi.file_wrapper only comes into play if the streamed object isn't an actual file object and so sending can't be optimised, or if Windows used. Where optimisation is possible, then Apache/mod_wsgi is using either memory mapping or sendfile() system call to speed things up. As to performance, if using daemon mode of Apache/mod_wsgi, the benefits of using wsgi.file_wrapper are not as marked. This is because of the additional socket hop due to proxying data between Apache server child process and the daemon process. You still have this hop if using fastcgi or mod_proxy to backend web2py running with internal WSGI server, so any improvement is still good though. For daemon mode, the rate for that file is 100 requests/sec. For default web2py streamer and 1MB chunk size it is 80 requests/sec. For web2py streamer and 64KB chunk size, comes in better as about 90 requests/sec. Now, my tests results above don't involve web2py. You have to remember that when using that code from inside of web2py, you take the additional hit from the overhead of web2py, its routing and dispatch etc. Thus, although performance for returning of file content may be improved, that is only one part of the overall time taken up and may be a minimal amount given that database access etc will likely still predominate. Anyway, I present all this so simply can be evaluated whether for WSGI hosting mechanism you want to allow wsgi.file_wrapper to be used if available. The feature would only get used if WSGI hosting mechanism provided it. I perhaps would suggest though that if supported, and made default, that you allow user to disable use of it. This is because some WSGI hosting mechanisms may not implement wsgi.file_wrapper properly. The particular area of concern would be web2py byte range support. In Apache/mod_wsgi it will pay attention to Content-Length in response and only send that amount of data from file, but other WSGI hosting mechanisms may not, and will send all file content from the seek point to end of file. This could result in more data than supposed to being returned for a byte range request. If overly concerned, you might make ability to use wsgi.file_wrapper off by default, but allow user to turn it on. This way if a specific user finds it actually helps for their specific application because of heavy streaming of files, then they can turn it on. Even if not interested in wsgi.file_wrapper, I would suggest you have another look at the 1MB chunk size you are currently using and see whether that is appropriate for all platforms. You might want to make the default globally configurable. Certainly on MacOS X Snow Leopard under Apache/mod_wsgi, a block size of 64KB performs better than the current default of 1MB. Graham --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py-users" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to web2py+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---