On 17/07/2005, at 6:16 PM, Ian Bicking wrote: >> The pipeline itself isn't really late bound. For instance, if I was >> to >> create a WSGI middleware pipeline something like this: >> >> server <--> session <--> identification <--> authentication <--> >> <--> challenge <--> application >> >> ... session, identification, authentication, and challenge are >> middleware components (you'll need to imagine their implementations). >> And within a module that started a server, you might end up doing >> something like: >> >> def configure_pipeline(app): >> return SessionMiddleware( >> IdentificationMiddleware( >> AuthenticationMiddleware( >> ChallengeMiddleware(app))))) > > This is what Paste does in configuration, like: > > middleware.extend([ > SessionMiddleware, IdentificationMiddleware, > AuthenticationMiddleware, ChallengeMiddleware]) > > This kind of middleware takes a single argument, which is the > application it will wrap. In practice, this means all the other > parameters go into lazily-read configuration.
Sorry, but you have given me a nice opening here to hijack this conversation a bit and make some comments and pose some questions about WSGI that I have been thinking on for a while. My understanding from reading the WSGI PEP and examples like that above is that the WSGI middleware stack concept is very much tree like, but where at any specific node within the tree, one can only traverse into one child. Ie., a parent middleware component could make a decision to defer to one child or another, but there is no means of really trying out multiple choices until you find one that is prepared to handle the request. The only way around it seems to be make the linear chain of nested applications longer and longer, something which to me just doesn't sit right. In some respects the need for the configuration scheme is in part to make that less unwieldy. To explain what I am going on about, I am going to use examples from some work I have been doing with componentised construction of request handler stacks in mod_python. I will not use the term middleware here, as I note that someone here in this discussion has already made the point of saying that the components being talked about here aren't really middleware and in what I have been doing I have been taking it to an even more fine grained level. I believe I can draw a reasonable analogy to mod_python as at the simplest, a mod_python request handler and a WSGI application are both providing the most basic function of proving the service for responding to a request, they just do so in different ways. Normally in mod_python a handler can return an OK response, an error response or a DECLINED response. The DECLINED response is special and indicates to mod_python that any further content handlers defined by mod_python should be skipped and control passed back up to Apache so that it can potentially serve up a matched static file. What I am doing is making it acceptable for a handler to also return None. If this were returned by the highest level handler, it would equate to being the same as DECLINED, but within the context of middleware components it has a lightly relaxed meaning. Specifically, it indicates that that handler isn't returning a response, but not that it is indicating that the request as a whole is being DECLINED causing a return to Apache. Doing this means that within the context of a tree based middleware stack, at a particular node in the stack one can introduce a list of handlers at a particular node. Each handler in the list will in turn be tried to see if it wishes to handle the response, returning either an error or valid response, or None. If it doesn't raise a response, the next handler in the list would be tried until one is found, and if one isn't, then None is passed back to the parent middleware component. This all means I could write something like: handler = Handlers( IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) This handler might be associated with any access to a directory as a whole. In iterating over each of the handlers it filters out requests to files that we don't want to provide access to, with the final handler deferring to a handler within a Python module associated with the actual resource being requested. Although Apache provides means of filtering out requests, it only works properly for physical files and not virtual resources specified by way of the path info. For example, a file "page.tmpl" (a Cheetah file) could have a "page.py" file that defines: handler = Handlers( IfLocationMatches(r"\.bak(/.*)?$",NotFound()), IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()), IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()), ) Again, more filtering and finally a handler is triggered which knows how to trigger a precompiled Cheetah template stored as a Python module. All in all a similar tree like structure to WSGI, except you have the ability to iterate through handlers at one level with them being able to explicitly define that they aren't providing a response and instead allowing the next handler to be tried. My experience with this so far is that it has allowed more fine grained components to be created which provide specific filtering without it all turning into a mess due to having to nest each handler within another in a big pipeline as things seem they must be done in WSGI. In mod_python one already has access to a table object storing configuration options set within the Apache configuration for mod_python, plus the ability to add Python objects into the mod_python request object itself as necessary In terms of configuration, using this ability of a list of handlers where they don't actually return a response, seems to me to make it easier to avoid having to have a separate configuration system for most stuff. For example, I can have a handler "SetPythonOption" which sets an option in the options table object and always returns None, thus passing control onto the next handler. In the highest level handler before point where control is dispatched off to a separate Python module or special purpose handler, one can thus define the configuration as necessary. handler = Handlers( SetPythonOption("PythonDebug","1"), SetPythonOption("ApplicationPath","/application"), IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) In other words, the code itself contains the configuration and one doesn't have to worry about where the configuration is found and working out what you may need from it. Of course you could still have a separate configuration object and provide a special purpose handler which merges that into the environment of the request object in some way. For this later case, inline with how its request object is used, you could have something like: config = getApplicationConfig() handler = Handlers( SetRequestAttribute("config",config), IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) Having done that, any later handler could access "req.config" to get access to the configuration object and use it as necessary. In WSGI such things would be placed into the "environ" dictionary and propagated to subsequent applications. One last example, is what a session based login mechanism might look like since this was one of the examples posed in the initial discussion. Here you might have a handler for a whole directory which contains: _userDatabase = _users.UserDatabase() handler = Handlers( IfLocationMatches(r"\.bak(/.*)?$",NotFound()), IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()), IfLocationIsADirectory(ExternalRedirect('index.html')), # Create session and stick it in request object. CreateUserSession(), # Login form shouldn't require user to be logged in to access it. IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()), # Serve requests against login/logout URLs and otherwise # don't let request proceed if user not yet authenticated. # Will redirect to login form if not authenticated. FormAuthentication(_userDatabase,"login.html"), SetResponseHeader('Pragma','no-cache'), SetResponseHeader('Cache-Control','no-cache'), SetResponseHeader('Expires','-1'), IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()), ) Again, one has done away with the need for a configuration files as the code itself specifies what is required, along with the constraints as to what order things should be done in. Another thing this example shows is that handlers when they return None due to not returning an actual response, can still add to the response headers in the way of special cookies as required by sessions, or headers controlling caching etc. In terms of late binding of which handler is executed, the "PythonModule" handler is one example in that it selects which Python module to load only when the request is being handled. Another example of late construction of an instance of a handler in what I am doing, albeit the same type, is: class Handler: def __init__(self,req): self.__req = req def __call__(self,name="value"): self.__req.content_type = "text/html" self.__req.send_http_header() self.__req.write("<html><body>") self.__req.write("<p>name=%r</p>"%cgi.escape(name)) self.__req.write("</body></html>") return apache.OK handler = IfExtensionEquals("html",HandlerInstance(Handler)) First off the "HandlerInstance" object is only triggered if the request against this specific file based resource was by way of a ".html" extension. When it is triggered, it is only at that point that an instance of "Handler" is created, with the request object being supplied to the constructor. To round this off, the special "Handlers" handler only contains the following code. Pretty simple, but makes construction of the component hierarchy a bit easier in my mind when multiple things need to be done in turn where nesting isn't strictly required. class Handlers: def __init__(self,*handlers): self.__handlers = handlers def __call__(self,req): if len(self.__handlers) != 0: for handler in self.__handlers: result = _execute(req,handler,lazy=True) if result is not None: return result Would be very interested to see how people see this relating to what is possible with WSGI. Could one instigate a similar sort of class to "Handlers" in WSGI to sequence through WSGI applications until one generates a complete response? The areas that have me thinking the answer is "no" is that I recollect the PEP saying that the "start_response" object can only be called once, which precludes applications in a list adding to the response headers without returning a valid status. Secondly, if "start_response" object hasn't been called when the parent starts to try and construct the response content from the result of calling the application, it raises an error. But then, I have a distinct lack of proper knowledge on WSGI so could be wrong. If my thinking is correct, it could only be done by changing the WSGI specification to support the concept of trying applications in sequence, by way of allowing None as the status when "start_response" is called to indicate the same as when I return None from a handler. Ie., the application may have set headers, but otherwise the parent should where possible move to a subsequence application and try it etc. Anyway, people may feel that this is totally contrary to what WSGI is all about and not relevant and that is fine, I am at least finding it an interesting idea to play with in respect of mod_python at least. BTW, WSGI itself could just become a plugable component within this mod_python middleware equivalent. :-) handler = Handlers( IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), WSGIApplicationModule(), ) Feedback most welcome. I have been trying to work out how what I am doing may transfered to WSGI for a little while, but if people think it is a stupid idea then I'll no longer waste my time on thinking about it and just stick with mod_python. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com