Re: [Web-SIG] Web application packaging
I've been doodling around with things, but honestly I've deployed zero Python apps in the last year, so lacking a use case of any kind I find myself rather unfocused, even though I feel a degree of confidence about the approach. Anyway, my indirect doodling is here: https://github.com/ianb/apppkg/ I'm interested in a cross-language approach, which means a lot of process isolation, and that's where things are vague right now – it would probably be a bit easier if that didn't mean process isolation with Python on both sides, because that's where it gets vague. On Thu, Jun 7, 2012 at 3:08 AM, Alex Morega a...@grep.ro wrote: Hello! There was a discussion here, about an year ago, about ways to deploy WSGI applications to servers. What is the status? What tools are out there, being currently developed, other than Buildout, Fabric and Silver Lining? Cheers, -- Alex ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Move www.wsgi.org to Read The Docs.
I believe Stephan Diehl owns wsgi.org. On Thu, Aug 18, 2011 at 4:14 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: Who owns and manages www.wsgi.org wiki? The amount of spam the wiki gets now is becoming rediculous. If we care about the wiki, it is time to take the content in it and dump it in github as a project which can then be loaded up to Read The Docs, with www.wsgi.org directing to that. In the mean time, can anyone else help clean up the spam. I am usually the only one who does it, but this time there is too much and becomes a waste of my time. I only have so many phone meetings where I can secretly be cleaning up the spam at the same time. So, many hands make light work. :-) Overall I reckon moving to github and Read The Docs may also encourage greater participation as far as putting some useful content in it. Personally I find wikis a pain for that sort of content and so can't be bothered to work on the actual content. If it was on guthub and Read The Docs I am more likely myself to help build out the content with actual decent useful content, moving some of the stuff I have blogged about or put elsewhere there instead. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On Wed, Apr 27, 2011 at 5:21 PM, Daniel Holth dho...@gmail.com wrote: I stumbled across https://apphosted.com as more web application package and format 'prior art'. It appears to be an App Engine competitor. According to their API documentation, their deployment format is an archive containing a single directory with your WSGI program and a metro.config. They put the database configuration in a settings.py written into the application's root with defined DB_URI, etc. There's something that bothers me about using settings.py, though I guess it's not that different from a YAML file or whatever, though with a cleverness danger. Conveniently you could do sys.modules['settings'] = new.module('settings') and avoid ever making a real file. Using the name settings *specifically* is likely to cause name clashes with existing Django applications. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On Fri, Apr 15, 2011 at 2:05 PM, Alice Bevan–McGregor al...@gothcandy.comwrote: I want to keep this distinct from anything long-running, which is a much more complex deal. The primary application is only potentially long-running. (You could, in theory, deploy an app as CGI, but that way lies madness.) However, the reference syntax mentioned (excepting URL) works well for identifying this. Right -- just one long running things (but no promises how long). I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?) For the simple cases (script / callable), it's pretty easy to trap STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR, then display that to the administrator in some form. Same for HTTP, except that it can include full HTML formatting information. For Silver Lining I set Accept: text/plain, to at least suggest that plain text was preferred, since typically HTML isn't easily displayed. But of course a tool could change that, probably usefully? But that only applies to HTTP. Anyway, seems easy enough. An application configuration could refer to scripts under different names, to be invoked at different stages. A la the already mentioned post-install, pre-upgrade, post-upgrade, pre-removal, and cron-like. Any others? test-environment, test-alive, test-functional are all possible test-alive could be used by, e.g., Nagios to monitor (it might actually have structured output?) There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc. Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too. Unit and functional tests are the most obvious. In which case we'll need to be able to provide a localhost-only 'mounted' location for the application even though it hasn't been installed yet. For local function HTTP tests you might want that, but if you are doing non-HTTP functional tests (e.g., just WGSI) or unit tests then the environment should always be sufficient without actually serving anything up. You'd probably want a test set of local services (as opposed to a development set of services). I think this will all be another kind of tooling around development. One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed. For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in. In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting). Humans are potentially better at reading tracebacks than machines are, so my previous logging idea (script output stored and displayed to the administrator in a readable form) combined with a modicum of reasonable exception handling within the script should lead to fairly clear errors. Deployers aren't very good at reading developer tracebacks, so it is kind of nice if you at least have a sense of the stage. One advantage to multiple testing stages is that you might roll back before, e.g., having to deal with database migrations. But easy enough to skip for now. I'd like to see maybe an | operator, and a distinction between required and optional services. E.g.: No need for some new operator, YAML already supports lists. services: - [mysql, postgresql, dburl] Or: services: required: - files optional: - [mysql, postgresql] And then there's a lot more you could do... which one do you prefer, for instance. The order of services within one of these lists would indicate preference, thus MySQL is preferred over PostgreSQL in the second example, above. Sure Tricky things: - You need something funny like multiple databases. This is very service-specific anyway, and there might sometimes need to be a way to configure the service. It's also a fairly obscure need. I'm not convinced that connecting to a legacy database /and/ current database is that obscure. It's also not as hard as Django makes it look (with a 1M SLoC change to add support)… WebCore added support in three lines. Well, then you are getting into specific configurations fitting into legacy
Re: [Web-SIG] A Python Web Application Package and Format
On Thu, Apr 14, 2011 at 2:53 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: On 14 April 2011 16:57, Alice Bevan–McGregor al...@gothcandy.com wrote: 3. Define how to get the WSGI app. This is WSGI specific, but (1) is *not* WSGI specific (it's only Python specific, and would apply well to other platforms) I could imagine there would be multiple application types: :: WSGI application. Define a package dot-notation entry point to a WSGI application factory. Why can't it be a path to a WSGI script file. This actually works more universally as it works for servers whichttps:// bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298hmap URLs to file based resources as well. Also allows alternate extensions than .py and also allows basename of file name to be arbitrarily named, both of which help with those same servers which map URLs to file base resources. It also allows same name WSGI script file to exist in multiple locations managed by same server without having to create an overarching package structure with __init__.py files everywhere. The main way to load applications in Silver Lining is basically like a wsgi script; or more specifically a file that is exec'd and it looks specifically for a variable application. Silver Lining also supports Paste Deploy .ini files, but in practice this doesn't seem that important (after all you can run paste.deploy.loadapp in the script). In this case the mapping of filenames and use of extensions doesn't matter, as applications would not be compelled to use any particular extension, and traversing into the application wouldn't make sense. Another thing that is common with .wsgi files (and similarly for App Engine script handlers) is that developers do all sorts of initialization (like changing sys.path etc). This makes it hard to access the application except through that entry point, thus requiring all access to be in the form of URL fetching (again like App Engine). So on one hand I like the .wsgi file technique; on the other hand I don't ;) Most of what we're talking about is, in Silver Lining, implemented in silversupport.appconfig. Particular pieces: Loading the application: https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-310 Set up sys.path: https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298 Set up services: https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-223 There's going to have to be a bit of indirection with services, as an application is asking in effect for an interface, and each tool may implement that interface differently (maybe a package could provide sort of an abstract base class for these, but the specific implementation is going to be very deployment-tool-specific). Also generally more is setup before the .wsgi-like script is executed in Silver Lining than in mod_wsgi. Well, here's the actual mod_wsgi-.wsgi script that Silver Lining uses: https://bitbucket.org/ianb/silverlining/src/8597f52305be/silverlining/mgr-scripts/master-runner.py But it's a bit confusing because it translates a bunch of variables set by the rather obtuse Apache config to figure out what application to run and how. But sys.path is fixed up, services are activated (mostly meaning they set their environmental variables), stderr/stdout is fixed up (since there's some sense of logging in the system, I felt there was no reason to bar use of those streams), and then some tool-specific stuff is done (e.g., fixing up the request URL given the Varnish setup). These are the examples of the kind of detailed specification of parts of the environment that I guess we need to have -- it's really how the entire process is setup that we need to specify, not just the WSGI request portion (which at least we don't have to specify much since that's done). Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
I think there's a general concept we should have, which I'll call a script -- but basically it's a script to run (__main__-style), a callable to call (module:name), or a URL to fetch internally. I want to keep this distinct from anything long-running, which is a much more complex deal. I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?) An application configuration could refer to scripts under different names, to be invoked at different stages. On Thu, Apr 14, 2011 at 1:57 AM, Alice Bevan–McGregor al...@gothcandy.comwrote: On 2011-04-13 18:16:36 -0700, Ian Bicking said: While initially reluctant to use zip files, after further discussion and thought they seem fine to me, so long as any tool that takes a zip file can also take a directory. The reverse might not be true -- for instance, I'd like a way to install or update a library for (and inside) an application, but I doubt I would make pip rewrite zip files to do this ;) But it could certainly work on directories. Supporting both isn't a big deal except that you can't do symlinks in a zip file. I'm not talking about using zip files as per eggs, where the code is maintained within the zip file during execution. It is merely a packaging format with the software itself extracted from the zip during installation / upgrade. A transitory container format. (Folders in the end.) Symlinks are an OS-specific feature, so those are out as a core requirement. ;) I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise I think the consensus is in that direction. Ambiguous statements FTW, but I think I know what you meant. ;) So specifically if you need something like lxml the application specifies that somehow, but doesn't specify *how* that library is acquired. There is some disagreement on whether this is generally true, or only true for libraries that are not portable. +1 I think something along the lines of autoconf (those lovely ./configure scripts you run when building GNU-style software from source) with published base 'checkers' (predicates as I referred to them previously) would be great. A clear way for an application to declare a dependency, have the application server check those dependencies, then notify the administrator installing the package. There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc. Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too. One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed. For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in. In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting). I've seen several Python libraries that include the C library code that they expose; while not so terribly efficient (i.e. you can't install the C library once, then share it amongst venvs), it is effective for small packages. Generally compiling seems fairly reliable these days, but it does typically require more system-level packages be installed (e.g., python-dev). Actually invoking these installations in an automated and reliable way seems hard to me. I find debs/rpms to work well for these cases. There is some challenge when you need something that isn't packaged, but in many ways the work you need to do is always going to be the same work you'd need to do to package that library or the new version of that library. So I'm inclined to ask people to lean on the existing OS-level tooling for dealing with these libraries. Larger (i.e. global or application-local) would require the intervention of a systems administrator. Something like a database takes this a bit further. We haven't really discussed it, but I think this is where it gets interesting. Silver Lining has one model for this. The general rule in Silver Lining is that you can't have anything with persistence without asking for it as a service, including an area to write files (except temporary files?) +1 Databases are slightly more difficult; an application could ask
Re: [Web-SIG] A Python Web Application Package and Format
While we are focusing on points of contention, there may be more points of consensus, but we aren't talking about those. So, some initial thoughts: While initially reluctant to use zip files, after further discussion and thought they seem fine to me, so long as any tool that takes a zip file can also take a directory. The reverse might not be true -- for instance, I'd like a way to install or update a library for (and *inside*) an application, but I doubt I would make pip rewrite zip files to do this ;) But it could certainly work on directories. Supporting both isn't a big deal except that you can't do symlinks in a zip file. I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise I think the consensus is in that direction. So specifically if you need something like lxml the application specifies that somehow, but doesn't specify *how* that library is acquired. There is some disagreement on whether this is generally true, or only true for libraries that are not portable. Something like a database takes this a bit further. We haven't really discussed it, but I think this is where it gets interesting. Silver Lining has one model for this. The general rule in Silver Lining is that you can't have anything with persistence without asking for it as a service, including an area to write files (except temporary files?) I assume everyone agrees that an application can't write to its own files (but of course it could execfile something in another location). I suspect there's some disagreement about how the Python environment gets setup, specifically sys.path and any other application-specific customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in silvercustomize.py, and find it helpful). Describing the scope of this, it seems kind of boring. In, for example, App Engine you do all your setup in your runner -- I find this deeply annoying because it makes the runner the only entry point, and thus makes testing, scripts, etc. hard. We would start with just WSGI. Other things could follow, but I don't see any reason to worry about that now. Maybe we should just punt on aggregate applications now too. I don't feel like there's anything we would do that would prevent other kinds of runtime models (besides the starting point, container-controlled WSGI), and the places to add support for new things are obvious enough (e.g., something like Silver Lining's platform setting). I would define a server with accompanying daemon processes as an aggregate. An important distinction to make, I believe, is application concerns and deployment concerns. For instance, what you do with logging is a deployment concern. Generating logging messages is of course an application concern. In practice these are often conflated, especially in the case of bespoke applications where the only person deploying the application is the person (or team) developing the application. It shouldn't be *annoying* for these users, though. Maybe it makes sense for people to be able to include tool-specific default settings in an application -- things that could be overridden, but especially for the case when the application is not widely reused it could be useful. (An example where Silver Lining gets is all backwards is I created a [production] section in app.ini when the very concept of production is not meaningful in that context -- but these kind of named profiles would make sense for actual application deployment tools.) An example of a setting currently in Silver Lining/app.ini that should become a tool-specific default setting would be default_location (the default place to upload your app to when you do silver update). There's actually a kind of layered way of thinking of this: 1. The first, maybe most important part, is how you get a proper Python environment. That includes sys.path of course, with all the accompanying libraries, but it also includes environment description. In Silver Lining there's two stages -- first, set some environmental variables (both general ones like $SILVER_CANONICAL_HOST and service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path proper, then import silvercustomize by which an environment can do any more customization it wants (e.g., set $DJANGO_SETTINGS_MODULE) 2. Define some basic generic metadata. app_name being the most obvious one. 3. Define how to get the WSGI app. This is WSGI specific, but (1) is *not* WSGI specific (it's only Python specific, and would apply well to other platforms) 4. Define some *web specific* metadata, like static files to serve. This isn't necessarily WSGI or even Python specific (not that we should bend backwards to be agnostic -- but in practice I think we'd have to bend backwards to make it Python-specific). 5. Define some lifecycle metadata, like update_fetch. These are generally commands to invoke. IMHO these can be ad hoc, but exist in the scope of (1)
Re: [Web-SIG] A Python Web Application Package and Format
On Sun, Apr 10, 2011 at 10:29 PM, Alice Bevan–McGregor al...@gothcandy.comwrote: Howdy! On 2011-04-10 19:06:52 -0700, Ian Bicking said: There's a significant danger that you'll be creating a configuration management tool at that point, not simply a web application description. Unless you have the tooling to manage the applications, there's no point having a standard for them. Part of that tooling will be some form of configuration management allowing you to determine the requirements and configuration of an application /prior/ to installation. Better to have an application rejected up-front (Hey, this needs my social insurance number? Hells no!) then after it's already been extracted and potentially littered the landscape with its children. I... think we are misunderstanding each other or something. A nice tool that could use this format, for instance, would be a tool that takes an app and creates a puppet recipe to setup a sever to host the application. A different tool (maybe better, maybe not?) would be a puppet plugin (if that's the terminology) that uses this format to tell puppet about all the requirements an application has, perhaps translating some notions to puppet-native concepts, or adding high-level recipes that setup an appropriate container (which can be as simple as a properly configured Nginx or Apache server). What I mean when I say there's a danger of becoming a configuration management tool, is that if you include hooks for the application to configure its environment you are probably stepping on the toes of whatever other tool you might use. And once you start down that path things tend to cascade. The escape valve in Silver Lining for these sort of things is services, which can kind of implement anything, and presumably ad hoc services could be allowed for. Generic services are useful, but not useful enough. You create a build process as part of the deployment (and development and everything else), which I think is a bad idea. Please elaborate. There is no requirement for you to use the application packaging format and associated tools (such as an application server) during development. In fact, like 2to3, that type of process would only slow things down to the point of uselessness. That's not what I'm suggesting at all. If you include something in the packaging format that indicates the libraries to be installed, then you are encouraging and perhaps requiring that the server install libraries during a deployment. Realistically this can't be entirely avoided, but I think it is a pretty workable separation to declare only those dependencies that can't reasonably be included directly in the application itself (e.g., lxml, MySQLdb, git, and so on). In Silver Lining those dependencies were expressed as Debian package names, installed via dpkg, but for a more general system it would need to be somewhat more abstract. But several configuration management tools have managed that abstraction already, so it seems feasible to handle this declaratively. My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging). I know. And the end result is you may have to massage .pth files yourself. If a tool requires you to, at any point during normal operation, hand modify internal files… that tool has failed at its job. One does not go mucking about in your Git repo's .git/ folder, as an example. .pth files aren't exactly an internal file -- they are documented feature of Python. And .git/config is also a human-readable/editable file! But I did note that the setup in Silver Lining was a bit too primitive. Not *quite* as primitive as App Engine, but close. I think it would be better to have a convention like adding lib/python/ to the path automatically. If you want, for example, src/myapp to also be added to the path then I don't think there's anything wrong with using a .pth file to do that; that's what they were created to do! How do you build a release and upload it to PyPi? Upload docs to packages.python.org? setup.py commands. It's a convienent hook with access to metadata in a convienent way that would make an excellent let's make a release! type of command. Also lots of libraries don't work when zipped, and an application is typically an aggregate of many libraries, so zipping everything just adds a step that probably has to be undone later. Of course it has to be un-done later. I had thought I had made that quite clear in the gist. (Core Operation, point 1, possibly others.) If a deploy process uses zip file that's fine, but adding zipping to deployment processes that don't care for zip files is needless overhead. A directory of files is the most general case. It's also something a developer can manipulate, so you don't get a mismatch between developers of applications
Re: [Web-SIG] A Python Web Application Package and Format
On Mon, Apr 11, 2011 at 2:56 AM, Ionel Maries Cristian ionel...@gmail.comwrote: Hello, I have few comments: - That file layout basically forces you to have your development environment as close to the production environment. This is especially visible if you're relying on python c extensions. Since you don't want to have the same environment constraints as appengine it should be more flexible in this regard and offer a way to generate the project dependencies somewhere else than the depeloper's machine. Yes; in this case in Silver Lining I have allowed non-portable libraries to be declared as dependencies, and then the deployment tool ensures they are installed. - There's no builtin support for logging configuration. This would be useful, yes; though I think the format itself would mostly want to declare how it logs and then deployment tools could try to configure that. E.g., it would be useful to have a list of logging names that an app uses. The actual configuration is deployment-specific, so shouldn't be inside the application format itself. - The update_fetch feels like a hack as it's not extensible to do lifecycle (hooks for shutdown, start, etc). Also, it's shouldn't be a application url because you'd want to run a hook before starting it or after stopping it. I guess you could accomplish that with a wsgi wrapper but there should be a clear separation between the app and hooks that manage the app. In Silver Lining you can also do scripts; I started with URLs because it was simpler on the implementation side, but scripts have generally been easier to develop, so at least the default could be revisited. At least in the case of mod_wsgi there isn't a very good definition of shutdown and start. There's the runner itself, that imports the WSGI application -- this is always run on start, but it's the start of the worker process, not necessarily the server process (IMHO starting the server process is an internal implementation detail we should not expose). Silver Lining also tries to import a silvercustomize module, which is kind of a universal initialization (also imported for tests, etc). atexit can be used to run stuff on process shutdown. I don't really see a compelling benefit to another process shutdown technique. It seems perhaps reasonable to have something that is run when the actual application instance is shut down, but I've never personally needed that in practice. Of course other configuration settings could be added for different states if they were reasonably universal states and there was a real need for those. - I'm not entirely clear on why you avoid a build process (war-like) prior to deployment. It works fine for appengine - but you don't have it's constraints. In my own experience with App Engine I found it to be a useful constraint -- it was not particularly hard to get around (at least if you understand the relevant tools) and while App Engine has annoying constraints this wasn't one of them. Of course I couldn't use lxml at all on App Engine, and I agree we shouldn't accept that constraint, but for the majority of libraries that are portable this isn't a constraint. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
(I'm confused; I just noticed there's a web-sig@python.org and python-web-...@googlegroups.com?) On Mon, Apr 11, 2011 at 2:01 PM, Daniel Holth dho...@gmail.com wrote: We have more than 3 implementations of this idea, the Python Web Application Package and Format or WAPAF, including Java's WAR files, Google App Engine, silverlining. Let's review the WAR file, approximately: (static files, .jsp) WEB-INF/web.xml WEB-INF/classes/org/example/myapplication.class WEB-INF/lib/some-library.jar WEB-INF/lib/145-other-libraries.jar Build the .war file, copy to server, done (ideally). Your program should require a standard Java installation plus whatever's in the .war file. The .war file is a .zip that follows certain conventions. In practice you might develop in and deploy exploded .war files which are exactly the same thing but unzipped. Since it's Java there is no classes/SQLAlchemy/src/sqlalchemy/__init__.py; the path for the code always starts at classes/, not at some arbitrary set of subdirectories under classes/ Yes, this is all very reminiscent of my thoughts about this application format, and I'm assuming web.xml is the kind of configuration file I expect, etc. I'd rather there be a convention like classes/ anyway (obviously with a different name ;) installation. Better to have an application rejected up-front (Hey, this needs my social insurance number? Hells no!) then after it's already been extracted and potentially littered the landscape with its children. Part of the potential win here is that the application need not litter anything. Like GAE, the server might keep all the previous versions you've uploaded and let you pick which one you want today. You shouldn't have to think about the state the server. Yes; and for instance Silver Lining can have multiple versions installed alongside each other, which makes it easier to do a quick update -- you can upload everything, make sure everything is okay, and only then actually make that new version active. If the build process is well defined you can do the same thing, but it's harder to be sure that it will work as expected. And if the build process is kind of free-form then you might end up in a place where you have to take down the old version of an app as you update the new version. Data migrations are a bit more tricky, but with the services concept they are possible, and can even be efficient if you use some deep Linux magic (but if you are okay with a bit of inefficiency, or only applying this to small databases, doing a fairly atomic application update is possible). One of the items in Silver Lining's TODO is having a formal concept of putting an application into read-only mode, which could be helpful for these updates as well. My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging). I know. And the end result is you may have to massage .pth files yourself. If a tool requires you to, at any point during normal operation, hand modify internal files… that tool has failed at its job. One does not go mucking about in your Git repo's .git/ folder, as an example. If I read the silverlining documentation correctly the .pth is created manually in the example only because there was no 'setup.py' to 'pip install -e'. As an alternative the spec could only add particular directories to PYTHONPATH. This might be a distutils2 thing. PYTHONPATH shouldn't apply here, as it informs the Python executable, and probably the executable will start before invoking the application (at least with mod_wsgi it does, and there's a lot of other use cases where it could). You could have a setting in app.ini (or whatever equivalent config file) with the paths to add, but I personally find that kind of messy feeling compared to existing conventions like .pth files. Ultimately they are equivalent -- a file with a path name that is added to sys.path. How do you build a release and upload it to PyPi? Upload docs to packages.python.org? setup.py commands. It's a convienent hook with access to metadata in a convienent way that would make an excellent let's make a release! type of command. setup.py should go away. The distutils2 talk from pycon 2011 explains. http://blip.tv/file/4880990 That's kind of a red herring -- even if setup.py goes away it would be replaced with something (pysetup I think?) which is conceptually equivalent. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On Sun, Apr 10, 2011 at 6:40 PM, Alice Bevan–McGregor al...@gothcandy.comwrote: On 2011-04-10 16:25:21 -0700, James Mills said: +1 too. I would however like to see this idea developed in a generic and useable way. ie: No zope/twisted deps or making it fit around Django :) Ideally it should be useable by the most basic (plain old WSGI). The following are the collected ideas of myself and a few other users in the WebCore chat room: https://gist.github.com/911991 Being generic (i.e. using WSGI under-the-hood) and allowing generic port assignments for other (non-web) networked applications is a design goal. There's a significant danger that you'll be creating a configuration management tool at that point, not simply a web application description. The escape valve in Silver Lining for these sort of things is services, which can kind of implement anything, and presumably ad hoc services could be allowed for. The aversion to packaged zips is not entirely understandable to us; in this case, a packaged copy of the application is produced via a setup.py command, though in theory one could develop with that model and just zip everything up in the end by hand. You create a build process as part of the deployment (and development and everything else), which I think is a bad idea. My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging). Also lots of libraries don't work when zipped, and an application is typically an aggregate of many libraries, so zipping everything just adds a step that probably has to be undone later. If a deploy process uses zip file that's fine, but adding zipping to deployment processes that don't care for zip files is needless overhead. A directory of files is the most general case. It's also something a developer can manipulate, so you don't get a mismatch between developers of applications and people deploying applications -- they can use the exact same system and format. Silver Lining seems to require too much in the way of hacking (modifying .pth files, etc) to be reasonable. The pattern that it implements is fairly simple, and in several models you have to lay things out somewhat manually. I think some more convention and tool support (e.g., in pip) would be helpful. Though there are quite a few details, the result is more reliable, stable, and easier to audit than anything based on a build process (which any use of dependencies would require -- there are *no* dependencies in a Silver Lining package, only the files that are *part* of the package). Some notes from your link: - There seems to be both the description of a format, and a program based on that format, but it's not entirely clear where the boundary is. I think it's useful to think in terms of a format and a reference implementation of particular tools that use that format (development management tools, like installing into the format; deployment tools; testing tools; local serving tools; etc). - In Silver Lining I felt no need at all for shared libraries. Some disk space can be saved with clever management (hard links), but only when it's entirely clear that it's just an optimization. Adding a concept like server-packages adds a lot of operational complexity and room for bugs without any real advantages. - I avoided exposing the concept of daemonization because it's not really an application concern; or at least it certainly is not appropriate for a WSGI application. There are other applications that might need this, mostly because they have no standard protocol equivalent to WSGI, but a generic container is almost certain to be of higher quality and better situated to its environment than a generic daemon. (PID files, ugh) At least supervisord I think has a better representation of how to express daemon configuration, but still I'm not a big fan of exposing this until it really feels necessary. - All dependencies are always version-sensitive; I think it's delusional that people think otherwise. Build the tooling to manage that process (e.g., finding and testing newer versions), not the deployment. - I try to avoid error conditions in the deployment, which is a big part of not having any build process involved, as build processes are a source of constant errors -- you can do a stage deployment, then five minutes later do a production deployment, and if you have a build process there is a significant chance that the two won't match. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] A Python Web Application Package and Format
Hi all. I wrote a blog post. I would be interested in reactions from this crowd. http://blog.ianbicking.org/2011/03/31/python-webapp-package/ Copied to allow responses: At PyCon there was an open space about deployment, and the idea of drop-in applications (Java-WAR-stylehttp://en.wikipedia.org/wiki/WAR_file_format_%28Sun%29 ). I generally get pessimistic about 80% solutions, and dropping in a WAR file feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer (which I think is *specifically* a project that got WARs on people’s minds), and in a lot of ways that installer is nice, but it’s also kind of wonky, it makes configuration unclear, it’s not always clear when it installs or configures itself through the web, and when you have to do this at the system level, nor is it clear where it puts files and data, etc. So a great initial experience doesn’t feel like a great ongoing experience to me — and *it doesn’t have to be that way*. If those were *necessary* compromises, sure, but they aren’t. And because we *don’t have* WAR files, if we’re proposing to make something new, then we have every opportunity to make things better. So the question then is what we’re trying to make. To me: we want applications that are easy to install, that are self-describing, self-configuring (or at least guide you through configuration), reliable with respect to their environment (not dependent on system tweaking), upgradable, and respectful of persistence (the data that outlives the application install). A lot of this can be done by the container (to use Java parlance; or environment) — if you just have the app packaged in a nice way, the container (server environment, hosting service, etc) can handle all the system-specific things to make the application actually work. At which point I am of course reminded of my Silver Lininghttp://cloudsilverlining.org/project, which defines something very much like this. Silver Lining isn’t *just* an application format, and things aren’t fully extracted along these lines, but it’s pretty close and it addresses a lot of important issues in the lifecycle of an application. To be clear: Silver Lining is an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time. It is a bunch of things, maybe too many things, so it is not unreasonable to pick out a smaller subset to focus on. Maybe an easy place to start (and good for Silver Lining itself) would be to separate at least the application format (and tools to manage applications in that state, e.g., installing new libraries) from the tools that make use of such applications (deploy, etc). Some opinions I have on this format, exemplified in Silver Lining: - It’s not zipped or a single file, unlike WARs. Uploading zip files is not a great API. Geez. I know there’s this desire to just drop in a file; but there’s no getting around the fact that dropping a file becomes a *deployment protocol* *and* *it’s an incredibly impoverished protocol*. The format is also not subtly git-based (ala Heroku) because git push is not a good deployment protocol. - But of course there isn’t really any deployment protocol inferred by a format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool that deploys should take as an argument a directory, not a single file. (If the tool then zips it up and uploads it, fine!) - Configuration comes from the outside. That is, an application requests services, and the *container* tells the application where those services are. For Silver Lining I’ve used environmental variables. I think this one point is really important — the container *tells* the application. As a counter-example, an application that comes with a Puppet deployment recipe is essentially *telling* the server how to arrange itself to suit the application. This will never be reliable or simple! - The application indicates what services it wants; for instance, it may want to have access to a MySQL database. The container then provides this to the application. In practice this means installing the actual packages, but also creating a database and setting up permissions appropriately. The alternative is never having *any* dependencies, meaning you have to use SQLite databases or ad hoc structures, etc. But in fact installing databases really isn’t that hard these days. - *All* persistence has to use a service of some kind. If you want to be able to write to files, you need to use a file service. This means the container is fully aware of everything the application is leaving behind. All the various paths an application should use are given in different environmental variables (many of which don’t need to be invented anew, e.g., $TMPDIR). - It uses vendor libraries exclusively for Python
Re: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments
It's implied by WSGI itself that the path be unquoted; there's no fix short of changing the specification. On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf f...@chaoflow.net wrote: I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not urllib.unquote the path [1] before setting it in the wsgi environment [2]. The only pre-processing performed on the path between [1] and [2] is concerned with slashes '/'. By urllib.unquoting it is not possible to have urllib.quoted slashes within one path segment. At least pyramid without routing fully relies on ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have slashes in path segments, they are handle by pyramid in [4]f. However, webob.request.BaseRequest would need to be adjusted wherever PATH_INFO from the environment is used (e.g [5]). Reasoning: The path stored in environ['PATH_INFO'] is still a path, therefore it must not be urllib.unquoted, the unquoting must happen after the path is split up in segments ([4]). [1] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180 [2] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217 [3] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594 [4] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495 [5] https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265 -- Florian Friesdorf f...@chaoflow.net GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: f...@chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments
I'll just add that *if* you can design your URL space (you didn't just inherit one), and you want to distinguish path segments from values that contain '/', you can use URLs like: /item/{some/value}/view And then use the matching {}'s to figure out that some/value is one path segment. This makes it possible, for instance, to use GData (where XML namespaces can show up in the URL, and they contain /'s, but they need to be treated as a single value). It's not perfect, but it does work. On Thu, Mar 17, 2011 at 4:02 PM, And Clover and...@doxdesk.com wrote: On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote: I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not urllib.unquote the path before setting it in the wsgi environment I'm afraid it must. This is something the WSGI specification inherits from CGI. Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO automatically unescaped, as it loses the distinction between ‘%2F’ and ‘/’, and has resulted in endless problems with non-ASCII characters that could otherwise been handled perfectly well as %-sequences. But that decision was taken a couple of decades ago and there's not really much we can do about it now. CGI may be an anachronism, but it is still widely used and its assumptions are still felt through Apache, IIS and WSGI. By urllib.unquoting it is not possible to have urllib.quoted slashes within one path segment. Correct. And neither Apache nor IIS allows %2F to be used within a path segment either, so really if you want to write a portable web app you simply have to avoid them (along with %00 and %5C). It is not currently practical to include any arbitrary byte sequence in a URL path segment, even though by the URL specification you should be able to. It's annoying, it's inelegant, it's limiting. But none of our attempts to extend or replace it for non-CGI-based servers (see past list discussion on path-info-raw or standardising REQUEST_URI) have come to any acceptable conclusion. We are stuck with it for the foreseeable. -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com gtalk:chat?jid=bobi...@gmail.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 != WSGI 2.0
Until the PEP is approved, it's just a suggestion. So for it to really be WSGI 2 it will have to go through at least some approval process; which is kind of ad hoc, but not so ad hoc as just to implicitly happen. For WSGI 2 to happen, someone has to write something up and propose it. Alice has agreed to do that, working from PEP 444 which several other people have participated in. Calling it WSGI 2 instead of Web 3 was brought up on this list, and the general consensus seemed to be that it made sense -- some people felt a little funny about it, but ultimately it seemed to be something everyone was okay with (with some people like myself feeling strongly it should be WSGI 2). I'm not sure why you are so stressed out about this? If you think it's really an issue, perhaps 2 could be replaced with 2alpha until such time as it is approved? On Sat, Jan 1, 2011 at 8:02 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: Can we please clear up a matter. GothAlice (don't know off hand there real name), keeps going around and claiming: After some discussion on the Web-SIG mailing list, PEP 444 is now officially WSGI 2, and PEP is WSGI 1.1 In this instance on web.py forum on Google Groups. I have pointed out a couple of times to them that there is no way that PEP 444 has been blessed as being the official WSGI 2.0 but they are not listening and are still repeating this claim. They can't also get right that PEP clearly says it is still WSGI 1.0 and not WSGI 1.1. If the people here who's opinion matters are quite happy for GothAlice to hijack the WSGI 2.0 moniker for PEP 444 I will shut up. But if that happens, I will voice my objections by simply not having anything to do with WSGI 2.0 any more. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.
On Sun, Dec 12, 2010 at 9:59 PM, Alice Bevan–McGregor al...@gothcandy.comwrote: Howdy! There's one issue I've seen repeated a lot in working with WSGI1 and that is the use of middleware to process incoming data, but not outgoing, and vice-versa; middleware which filters the output in some way, but cares not about the input. Wrapping middleware around an application is simple and effective, but costly in terms of stack allocation overhead; it also makes debugging a bit more of a nightmare as the stack trace can be quite deep. My updated draft PEP 444[1] includes a section describing Filters, both ingress (input filtering) and egress (output filtering). The API is trivially simple, optional (as filters can be easily adapted as middleware if the host server doesn't support filters) and easy to implement in a server. (The Marrow HTTP/1.1 server implements them as two for loops.) It's not clear to me how this can be composed or abstracted. @webob.dec.wsgify does kind of handle this with its request/response pattern; in a simplified form it's like: def wsgify(func): def replacement(environ): req = Request(environ) resp = func(req) return resp(environ) return replacement This allows you to do an output filter like: @wsgify def output_filter(req): resp = some_app(req.environ) fiddle_with_resp(resp) return resp (Most output filters also need the request.) And an input filter like: @wsgify def input_filter(req): fiddle_with_req(req) return some_app But while it handles the input filter case, it doesn't try to generalize this or move application composition into the server. An application is an application and servers are imagined but not actually concrete. If you handle filters at the server level you have to have some way of registering these filters, and it's unclear what order they should be applied. At import? Does the server have to poke around in the app it is running? How can it traverse down if you have dispatching apps (like paste.urlmap or Routes)? You can still implement this locally of course, as a class that takes an app and input and output filters. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.
On Tue, Dec 14, 2010 at 12:54 PM, Alice Bevan–McGregor al...@gothcandy.comwrote: An application is an application and servers are imagined but not actually concrete. Could you elaborate? (Define concrete in this context.) WSGI applications never directly touch the server. They are called by the server, but have no reference to the server. Servers in turn take an app and parameters specific to there serveryness (which may or may not even involve HTTP), but it's good we've gotten them out of the realm of application composition (early on WSGI servers frequently handled mounting apps at locations in the path, but that's been replaced with dispatching middleware). An application wrapped with middleware is also a single object you can hand around; we don't have an object that represents all of application, list of pre-filters, list of post-filters. If you handle filters at the server level you have to have some way of registering these filters, and it's unclear what order they should be applied. At import? Does the server have to poke around in the app it is running? How can it traverse down if you have dispatching apps (like paste.urlmap or Routes)? Filters are unaffected by, and unaware of, dispatch. They are defined at the same time your application middleware stack is constructed, and passed (in the current implementation) to the HTTPServer protocol as a list at the same time as your wrapped application stack. You can still implement this locally of course, as a class that takes an app and input and output filters. If you -do- need region specific filtering, you can ostensibly wrap multiple final applications in filter management middleware, as you say. That's a fairly advanced use-case regardless of filtering. I would love to see examples of what people might implement as filters (i.e. middleware that does ONE of ingress or egress processing, not both). From CherryPy I see things like: * BaseURLFilter (ingress Apache base path adjustments) * DecodingFilter (ingress request parameter decoding) * EncodingFilter (egress response header and body encoding) * GzipFilter (already mentioned) * LogDebugInfoFilter (egress insertion of page generation time into HTML stream) * TidyFilter (egress piping of response body to Tidy) * VirtualHostFilter (similar to BaseURLFilter) None of these (with the possible exception of LogDebugInfoFilter) I could imagine needing to be path-specific. GzipFilter is wonky at best (it interacts oddly with range requests and etags). Prefix handling is useful (e.g., paste.deploy.config.PrefixMiddleware), and usually global and unconfigured. Debugging and logging stuff often needs per-path configuration, which can mean multiple instances applied after dispatch. Encoding and Decoding don't apply to WSGI. Tidy is intrusive and I think questionable on a global level. I don't think the use cases are there. Tightly bound pre-filters and post-filters are particularly problematic. This all seems like a lot of work to avoid a few stack frames in a traceback. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)
On Thu, Sep 23, 2010 at 11:06 AM, P.J. Eby p...@telecommunity.com wrote: At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby mailto:p...@telecommunity.com p...@telecommunity.com wrote: The Python 3 specific changes are to use: * ``bytes`` for I/O streams in both directions * ``str`` for environ keys and values * ``bytes`` for arguments to start_response() and write() This is the only thing that seems odd to me -- it seems like the response should be symmetric with the request, and the request in this case uses str for headers (status being header-like), and bytes for the body. So, I've given some thought to your suggestion, and, while it's true that most of the output headers are far less prone to ending up with unintended unicode content, there are at least two output headers that can include some sort of application content (and can therefore have random failures): Location and Set-Cookie. If these headers accidentally contain non-Latin1 characters, the error isn't detectable until the header reaches the origin server doing the transmission encoding, and it'll likely be a dynamic (and therefore hard-to-debug) error. I don't see any reason why Location shouldn't be ASCII. Any header could have any character put in it, of course, there's just no valid case where Location shouldn't be a URL, and URLs are ASCII. Cookie can contain weirdness, yes. I would expect any library that abstracts cookies to handle this (it's certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing. This can also be detected with the validator, which doesn't avoid runtime errors, but bytes allow runtime errors too -- they will just happen somewhere else (e.g., when a value is converted to bytes in an application or library). If servers print the invalid value on error (instead of just some generic error) I don't think it would be that hard to track down problems. This requires some explicit effort on the part of the server (most servers handle app_iter==None ungracefully, which is a similar problem). However, if the output is always bytes (and this can be relatively-statically verified), then any error can't occur except *inside* the application, where the app's developer can find it more easily. So I guess the question boils down to: would we rather make sure that coding errors happen *inside* applications, or would we rather make porting WSGI apps trivial (or nearly so)? But I think that it's possible here to have one's cake and eat it too: if we require bytes for all outputs, but provide a pair of decorators in wsgiref.util like the following: def encode_body(codec='utf8'): Allow a WSGI app to output its response body as strings w/specified encoding def decorate(app): def encode(response): try: for data in response: yield data.encode(codec) finally: if hasattr(response, 'close'): response.close() def decorated_app(environ, start_response): def start(status, response_headers, exc_info=None): _write = start_response(status, response_headers, exc_info) def write(data): return _write(data.encode(codec)) return write return encode(app(environ, start)) return decorated_app return decorate def encode_headers(codec='latin1'): Allow a WSGI app to output its headers as strings, w/specified encoding def decorate(app): def decorated_app(environ, start_response): def start(status, response_headers, exc_info=None): status = status.encode(codec) response_headers = [ (k.encode(codec), v.encode(codec)) for k,v in response_headers ] return start_response(status, response_headers, exc_info) return app(environ, start) return decorated_app return decorate So, this seems like a win-win to me: relatively-static verification, errors stay in the app (or at least in the decorator), and the API is clean-and-easy. Indeed, it seems likely that at least some apps that don't read wsgi.input themselves could be ported *just* by adding the appropriate decorator(s). And, if your app is using unicode on 2.x, you can even use the same decorators there, for the benefit of 2to3. (Assuming I release an updated standalone wsgiref version with the decorators, of course.) This doesn't seem that different than the validator, except that the decorator uses a different interface internally and externally (the internal interface using text, the external one bytes). -- Ian Bicking | http://blog.ianbicking.org
Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)
On Thu, Sep 23, 2010 at 11:17 AM, Ian Bicking i...@colorstudy.com wrote: If these headers accidentally contain non-Latin1 characters, the error isn't detectable until the header reaches the origin server doing the transmission encoding, and it'll likely be a dynamic (and therefore hard-to-debug) error. I don't see any reason why Location shouldn't be ASCII. Any header could have any character put in it, of course, there's just no valid case where Location shouldn't be a URL, and URLs are ASCII. Cookie can contain weirdness, yes. I would expect any library that abstracts cookies to handle this (it's certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing. Minor correction, Set-Cookie, not Cookie. Good practice is to stick to ASCII even there (all other techniques have a high risk of mojibake), so we're really considering legacy integration. Note that a similar problem is using [('Content-length', len(body))] -- which also results in a sometimes confusing error message well away from the application itself. Generally without validation any data errors occur away from the application. A type error is not any different than an encoding error. Using bytes removes a possible encoding error, but IMHO has a greater chance of type errors (as bytes are not as natural as text in most cases). Validation can check all aspects, including encoding (simply by doing a test encoding). Consider this hello world: def app(environ, start_response): body = b'Hello World' start_response(b'200 OK', [(b'Content-Type', str(len(body)).encode('ascii'))]) return [body] str(len(body)).encode('ascii')?!? Yuck. Also no 2to3 fixup can help there. bytes(len(body)) does something weird. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)
On Thu, Sep 23, 2010 at 3:23 PM, P.J. Eby p...@telecommunity.com wrote: At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote: I don't see any reason why Location shouldn't be ASCII. Any header could have any character put in it, of course, there's just no valid case where Location shouldn't be a URL, and URLs are ASCII. Cookie can contain weirdness, yes. I would expect any library that abstracts cookies to handle this (it's certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing. This can also be detected with the validator, which doesn't avoid runtime errors, but bytes allow runtime errors too -- they will just happen somewhere else (e.g., when a value is converted to bytes in an application or library). Right: somewhere much closer to the *actual* error, where the developer can know the problem is, I have garbage data or have not selected an appropriate codec, rather than this WSGI stuff is giving me errors some place. If servers print the invalid value on error (instead of just some generic error) I don't think it would be that hard to track down problems. This requires some explicit effort on the part of the server (most servers handle app_iter==None ungracefully, which is a similar problem). The difference is that if a server rejects non-bytes, you'll know *right away* that your app isn't compliant, instead of having to wait until some non-latin1 data shows up. No, you've only pushed the encoding elsewhere, and the error elsewhere. Somewhere someone is probably doing text_value.encode('ascii') (or latin1 or whatever), and if they haven't tested with non-ascii or non-latin1 input then they might encounter an error. It will be in their code, not in the WSGI server, but the error will be present in all the same situations. I don't think it will be much harder to fix if it occurs in the WSGI server, so long as the error message is at least a little bit helpful. AFAICT, there are only two advantages to using text for output headers: 1. Text is easier to work with, and 2. It's symmetric with using text for input headers. Both of which can still be had, by using the @encode_headers decorator. Sure, anything can be fixed in a library. But @encode_headers is just another library. And it also can't magically appear with 2to3, instead it requires yet more patches and weird workarounds. Also, what you are proposing hasn't been considered for PEP 444, though other combinations of bytes and text have (all symmetric). So it doesn't seem to have any clean way to translate into the next version of the specification. I'm a little bit on the fence on this one, because 1) it does seem a little pointless (if harmless) to shuffle headers around in bytes form, and 2) Location and Set-Cookie are very likely the only headers where any kind of damage could ever happen. Set-Cookie only, Location is clean. The entirety of hand-wringing over bytes is all just about freakin' cookies. Or the theory of cookies, I don't know that anyone has yet encountered any concrete and vexing problems. But, since it *can* happen, and because it is also really easy to fix the API issue with a decorator, I'm still leaning in favor of output is bytes over headers are text, bodies are bytes, unless somebody can come up with either some actually-bad consequence of using bytes, or some extra-good consequence of using text (that isn't addressed by just using the decorator). (Note, by the way, that WSGI design has always leaned in the direction of any convenience that can be handled by a library should be, if it keeps the spec simpler and more verifiable. So, this seems like a good use of that principle.) It only fixes the one case of non-Latin1 characters, there are still many other values you can put into a header (a newline or control character for instance), and innumerable header-specific issues. It seems to be adding complexity for one of the least problematic cases. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough chr...@plope.com wrote: On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions. If a WSGI-1-compatible protocol seems more sensible to folks, I'm personally happy to defer discussion on PEP 444 or any other backwards-incompatible proposal. I think both make sense, making WSGI 1 sensible for Python 3 (as well as other small errata like the size hint) doesn't detract from PEP 444 at all, IMHO. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby p...@telecommunity.com wrote: The Python 3 specific changes are to use: * ``bytes`` for I/O streams in both directions * ``str`` for environ keys and values * ``bytes`` for arguments to start_response() and write() This is the only thing that seems odd to me -- it seems like the response should be symmetric with the request, and the request in this case uses str for headers (status being header-like), and bytes for the body. Otherwise this seems good to me, the only other major errata I can think of are all listed in the links you included. * text stream for wsgi.errors In other words, strings in, bytes out for headers, bytes for bodies. In general, only changes that don't break Python 2 WSGI implementations are allowed. The changes should also not break mod_wsgi on Python 3, but may make some Python 3 wsgi applications non-compliant, despite continuing to function on mod_wsgi. This is because mod_wsgi allows applications to output string headers and bodies, but I am ruling that option out because it forces every piece of middleware to have to be tested with arbitrary combinations of strings and bytes in order to test compliance. If you want your application to output strings rather than bytes, you can always use a decorator to do that. (And a sample one could be provided in wsgiref.) I agree allowing both is not ideal. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Sun, Sep 19, 2010 at 11:32 AM, Chris McDonough chr...@plope.com wrote: I propose to write in the PEP that a middleware should provide an app attribute to get the wrapped application or middleware. It seems to be the most common name used out there. We can't really mandate this because middleware is not required to be an instance. It can be a function. We could suggest it, and suggest the attribute name. Composites, lazy loading middleware, or a bunch of other situations can break it... but it's nice for introspection tools to at least be able to attempt to run down the chain. Middleware is almost always a closure if it's a function, I believe, so you could still do: def caps(app): def replacement_app(environ): status, headers, body = app(environ) body = [''.join(body).upper()] return status, headers, body replacement_app.app = app return replacement_app -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Sat, Sep 18, 2010 at 5:03 AM, Marcel Hellkamp m...@gsites.de wrote: With WSGI it was possible to yield empty strings as long as the application is waiting for data and call start_response once the headers are final. Not perfect, but at least non-blocking. Web3 removes this possibility. The headers must be returned before the body iterable yielded its first element, empty or not. Removing any support for this type of asynchronism would render web3 useless for all but completely synchronous and trivial applications. Even frameworks would have no way to work around this anymore. I'm aware of what a lot of people have done with WSGI, but I'm not aware of anyone doing an async proxy of any sort, or implementing anything in a way where this empty string policy served any function. It's not implausible that it *could* be used, but years of practice have shown it is not used. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Fri, Sep 17, 2010 at 9:43 AM, And Clover and...@doxdesk.com wrote: On 09/17/2010 02:03 PM, Armin Ronacher wrote: In case we change the spec as Ian mentioned above, I am all for a wsgi.guessed_encoding = True flag or something like that. Yes, I'd like to see that. I believe going with *only* a raw-or-reconstructed path_info, rather than having both path_info and PATH_INFO, is probably best, for the middleware-dupication reasons PJE mentioned. A more in-depth possibility might be: wsgi.path_accuracy = 0: script_name/path_info have been crudely reconstructed from SCRIPT_NAME/PATH_INFO from an unknown source. Beware! If there is to be backwards compatibility with WSGI1, this would be seen as the 'default value' given a missing path_accuracy. 1: script_name/path_info have been reconstructed, but it is known that path_info is accurate, other than %2F and non-ASCII issues. That is, it's known that the path doesn't come from IIS's broken PATH_INFO, or the IIS error has been detected and compensated for. 2: script_name/path_info have been reconstructed using known-good encodings for the env. The only way in which they may differ from the original request path is that a slash might originally have been a %2F. (This is good enough for the vast majority of applications.) 3: script_name/path_info come directly from the request path without any intervening mangling. path_accuracy is certainly a better name than encoding; nothing here actually relates to encoding (except insofar as attempts to encode or reencode values corrupts the path). Personally I wouldn't want to split it up this much, I'd rather a simple flag to indicate something was guessed, vs. an accurate request. The only real value I see in it is to help people debug problems. Maybe. I'm not sure it's that realistic to imagine this will be noticed by people deploying software and encountering problems. A helpful application could use it to warn the deployer of potential problems. It seems that it would be possible to create a WSGI application and client library that together can detect and help resolve these issues. E.g., the application always returns the values of script_name, path_info, and query_string, and the client fires off a bunch of different requests to see how it gets interpreted. It could suggest corrections until everything passes. I would really like to see concerns over bad gateways not be used to keep valuable information out of the spec. We want people to use well-configured gateways that accurately represent requests. There are limits, e.g., in environments where information is lost. The only really problematic example is losing the distinction between %2f and /, and I think it's reasonable to suggest that applications should avoid making that distinction in the path if they want to be easily deployed in different environments. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Fri, Sep 17, 2010 at 1:02 PM, Ian Bicking i...@colorstudy.com wrote: I would really like to see concerns over bad gateways not be used to keep valuable information out of the spec. We want people to use well-configured gateways that accurately represent requests. There are limits, e.g., in environments where information is lost. The only really problematic example is losing the distinction between %2f and /, and I think it's reasonable to suggest that applications should avoid making that distinction in the path if they want to be easily deployed in different environments. Just to expand -- the reason %2f is special is because / has special meaning in URL paths, or at least is treated as such. ? has special meaning too, but that's already handled by splitting off QUERY_STRING. Technically ; is supposed to mean something, but no one ever cared, so it doesn't really. In theory you could make any character special, and in doing so want an escape mechanism to determine the difference between, e.g., , and %2c... but no one does that, so no problem. All the other potential problems are problems of gateway corruption. E.g., where the bytes were decoded with Latin1 and then encoded with sys.getfilesystemencoding(), or some other mismatched combination. I don't believe we should expose gateway corruption to the spec. I *do* believe that we can build tools inside WSGI to help debug and fix those problems, and I don't think any of these changes makes those tools particularly harder to implement. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Fri, Sep 17, 2010 at 1:37 PM, chris.d...@gmail.com wrote: On Fri, 17 Sep 2010, Ionel Maries Cristian wrote: I feel this spec puts too much burden on applications - having to process all those byte strings and even having to add Content-Length even for naive buffered-body apps. The Content-Length requirement is a big killer for me. I'm usually generating content in apps, rather deep in a stack of middleware-like pieces that may or may not be looking at or modifying that content. I don't want to a) have to unwind my generators at each level b) reset the content-length here there and everywhere. It could be I'm doing it completely wrong, but it works rather nicely. I'm unclear what exactly you guys are reacting to. This? - The server must not inject an additional Content-Length header by guessing the length from the response iterable. This must be set by the application itself in all situations. I'm also not sure what motivated this particular change, but I don't have any opinion one way or the other. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Fri, Sep 17, 2010 at 2:06 PM, Armin Ronacher armin.ronac...@active-4.com wrote: Hi, On 9/17/10 7:43 PM, Ian Bicking wrote: I'm also not sure what motivated this particular change, but I don't have any opinion one way or the other. Motivation is that WSGI wants servers to do something like this: if len(iterable) == 1 and content_length_header_missing: headers.append(('Content-Length', str(len(iterable[0]))) However not everybody was doing that and some applications were setting a content length header or not. If a content length header was not set some middlewares that changed content worked properly even though they did not check the header. The idea is that with web3 every tool in the chain is supposed to look for that header and update it appropriately. Even the piglatin middleware from the PEP 333 did not check the content length if I remember correctly. OK, so maybe it should just be clarified: * Middleware and servers should not modify or add Content-Length, Date, or other headers unless they have reason to do so, and they must ensure that the response is valid (e.g., there should never be two Content-Length headers). It still seems reasonable that *if* there is no Content-Length, and the server can guess easily enough (mostly it is returned an actual list/tuple that we know can be introspected fast and without side effects), then it's perfectly reasonable to set it -- but certainly the server doesn't own that header (or any other, except maybe some connection-related headers?). -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Well, reiterating some things I've said before: * This is clearly just WSGI slightly reworked, why the new name? * Why byte values in the environ? No one has offered any real reason they are better than native strings. I keep asking people to offer a reason, *and no one ever does*. It's just hyperbole and distraction. Frankly I'm feeling annoyed. So far my experience makes me believe using native strings will make it easier to port and support libraries across 2 and 3. * It makes sense to me that the error stream should accept both bytes and unicode, and should do a best effort to handle either. Getting encoding errors or type errors when logging an error is very distracting. * Instead of focusing on Response(*response_tuple), I'd rather just rely on something like Response.from_wsgi(response_tuple). Body first feels very unnatural. * Regarding long response headers, I think we should ignore the HTTP spec. You can put 4k in a Set-Cookie header, such headers aren't easily or safely folded... I think the line length constraint in the HTTP spec isn't a constraint we need to pay attention to. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Thu, Sep 16, 2010 at 12:35 PM, Guido van Rossum gu...@python.org wrote: On Thu, Sep 16, 2010 at 10:01 AM, Ian Bicking i...@colorstudy.com wrote: Well, reiterating some things I've said before: * This is clearly just WSGI slightly reworked, why the new name? * Why byte values in the environ? No one has offered any real reason they are better than native strings. I keep asking people to offer a reason, *and no one ever does*. It's just hyperbole and distraction. Frankly I'm feeling annoyed. So far my experience makes me believe using native strings will make it easier to port and support libraries across 2 and 3. Hm. IIUC the proposal is to implicitly assume Latin1 when decoding the bytes to Unicode. I worry that this will just perpetuate mojibake and other atrocities committed in Python 2. I was reading http://python.org/dev/peps/pep-0444/ -- is there another revision under discussion? This seems to explicitly say all environ values will be bytes. There have been other str-oriented proposals, including mod_wsgi's implementation. There is consensus that request and response bodies should be bytes. So really we're talking about whether headers and status are bytes or native strings. Most HTTP headers can only contain sensible characters in ASCII, and while anyone can submit anything in a header I'm not aware of it being a problem that, e.g., someone submits a Cache-Control header with non-ASCII values. There are a small number of headers that can reasonably contain Latin1 characters. Latin1 is specified in HTTP, and in a few instances RFC2047 encoding is allowed, though I don't believe anyone proposes that servers should try to handle RFC2047 (I believe CherryPy does/did do this, but I believe Robert Brewer who is in charge of that project supports removing that). There are headers that can reasonably contain RFC2047, but this can be decoded at the application level. The Cookie header does frequently contain incorrect encodings, but to handle this you have to decode the header as bytes or latin1 (all the meaningful characters are the same in both cases) and then decode/transcode values after parsing. Latin1 imposes only a small speedbump for a header that already has a bunch of speedbumps. The other case when Latin1 is not appropriate is the URL-decoded path, WSGI 1's SCRIPT_NAME and PATH_INFO. This proposal removes those. The URL-encoded values are ASCII-safe, or at least could be safely normalized to be safe in the server level. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Thu, Sep 16, 2010 at 4:58 PM, Armin Ronacher armin.ronac...@active-4.com wrote: - Bytes values in the environment: HTTP transmits bytes, that's a fact we can't change. When we go with native strings we will go with unicode on 3.x This has the following implications: - getting the right path info requires a decode + an encode unless you are assuming latin1. Not if you are working with the URL-encoded paths. - same as above for the script name and cookie header Cookie is weird. If that one header could be bytes, that'd be great... but special-casing Cookie/Set-Cookie is too hard/weird. Plus handling Cookie/Set-Cookie as Latin1 is just one more line of code (well, two, one for each header). When going with unicode strings on 3.x for environ values, we would have to do the same for outgoing values which makes middlewares a lot harder to write: All response headers handle encoded URLs (e.g., Location), so SCRIPT_NAME/PATH_INFO issues don't come into play. Set-Cookie could be an issue, though only really when someone wants to replicate an external system's weird cookies -- except for legacy issues it's best for application developers to stick to ASCII cookies (URL-encoding cookie values is a popular way of doing this). I don't know of any other header (or the status) that would reasonably cause a problem. And I'm not glossing over corner cases -- I'm generally very aware and concerned with legacy issues, and interacting with legacy systems. There just aren't any here except for the resolvable issues I've listed. - web3.errors I think Ian raised concern that it's specified to support unicode only. I don't think we should change that to accepting either bytes or unicode is a good idea on Python 3 where there is no stream in the language or standard library that accepts both at the same time. An implementation for 2.x could support both, but I don't know if there is a usecase for that. In general though I have to say that very few people use wsgi.errors currently, so I don't think this is a real issue anyways. It's more of an issue under Python 2, it could probably be ignored with Python 3. Under Python 2 when you have some error condition it's really frustrating to encounter some unicode error with the logging of that error (often covering up the original error). -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
On Thu, Sep 16, 2010 at 9:59 PM, Armin Ronacher armin.ronac...@active-4.com wrote: On 9/17/10 3:43 AM, Ian Bicking wrote: Not if you are working with the URL-encoded paths. SCRIPT_NAME / PATH_INFO will always stay unencoded and the current spec requires the web3.script_name thing to only be provided if the server can safely provide that. So at least for the fallback, we are dealing with (properly latin1 decoded) non-URL encoded things. Can be changed of course. Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away. For servers without access to the unencoded value, reencoding those values doesn't actually lose any information over what we have now, and avoids any encoding issues. Servers with REQUEST_URI can at least attempt to reconstruct the encoded values. Cookie is weird. If that one header could be bytes, that'd be great... but special-casing Cookie/Set-Cookie is too hard/weird. Special casing one header is indeed weird. Cookie is also the one header that can't be safely folded. It's just a messed up header, and requires hacky workarounds. I don't know of any other header (or the status) that would reasonably cause a problem. And I'm not glossing over corner cases -- I'm generally very aware and concerned with legacy issues, and interacting with legacy systems. There just aren't any here except for the resolvable issues I've listed. Technically speaking it would affect etags too, but I doubt anyone is using non-ASCII quoted strings there. A very funny header is btw the Warning header which actually can have any encoding: The warn-text SHOULD be in a natural language and character set that is most likely to be intelligible to the human user receiving the response. This decision MAY be based on any available knowledge, such as the location of the cache or user, the Accept-Language field in a request, the Content-Language field in a response, etc. The default language is English and the default character set is ISO-8859-1. If a character set other than ISO-8859-1 is used, it MUST be encoded in the warn-text using the method described in RFC 2047 [14]. Doubt anyone is using that header though. The Title header (in Atompub) also suggests 2047, but that's essentially an ASCII conversion like URL quoting. It looks something like =?iso-8859-1?q?p=F6stal?= -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote: On 07/14/2010 06:43 AM, Ian Bicking wrote: There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. (And of those, PATH_INFO is the only one that really matters, in that no-one really uses non-ASCII script filenames, FWIW, I had to go to a lot of trouble to allow non ASCII in final SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in Apache means a file system path can make up part of SCRIPT_NAME. I had someone who was specifically using Russian in a WSGI script file name and because with AddHandler that becomes part of SCRIPT_NAME you had to cater for it. Anyway this was more of a Windows issue in having to use special file system functions to deal with fact that on Windows filesystem paths aren't UTF-8 but something else. What this does highlight though is that although one can talk about passing raw script name through to application, that isn't necessarily right as it isn't the application that dictates what encoding may be used but the web server which is performing the mapping of that part of the original URL path to a potential filesystem resource, or alternatively where file based configuration for mount point, the encoding of the web sever configuration file. This is an Apache-specific issue. It definitely doesn't apply to paste.httpserver, I doubt CherryPy or wsgiref. I don't really know how Nginx or other servers work. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby p...@telecommunity.com wrote: At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote: And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. OTOH, it has the tremendous advantage of pushing the encoding question onto the app (or framework) developer... who's really the only one who can make the right decision for their particular application. And personally, I'd rather have clear boundaries between text and bytes, such that porting (even if tedious or awkward) is *consistent*, and clear as to when you're finished, not, oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO... not just in my app code, but in all the library code I call *from* my app? IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on Python 3000 to say we go with bytes on Python 3+ for everything that's a str in today's WSGI. This was my first intuition too, until I started thinking in more detail about the particular values involved. Some obviously are textish, like environ['SERVER_NAME']. Not a very useful value, but definitely text. Basically all the internal strings are textish, so we're left with: wsgi.url_scheme SCRIPT_NAME/PATH_INFO QUERY_STRING HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers) response status response headers (name and value) And there's a few things like REMOTE_USER that are kind of in the middle. Everyone is in agreement that bodies should be bytes. One initial problem is that the Python 3 stdlib handles bytes poorly, so for instance there's no good way to reconstruct the URL using the stdlib. That explains certain tensions, but I think we should ignore that, and in fact that's what Python-Dev seemed to say pretty clearly. Now, the other keys: wsgi.url_scheme: clearly ASCII SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old legacy encoding. raw request path: should be ASCII (non-ASCII should be URL-encoded). URL encoding happens at the byte layer, so a server could reasonably URL encode any non-ASCII characters without imposing any encoding. QUERY_STRING: should be ASCII, same as raw request path headers: Most are ASCII. Latin1 is a reasonable fallback and suggested by the specification. The spec also implies you have use the RFC2047 inline encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and supporting it would probably be a bad idea for security reasons. The Atompub spec (reasonably modern) specifically says Title headers should be encoded with RFC2047 (if they are not ISO-8859-1): http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 -- decoding this kind of encoding at the application layer seems reasonable to me. cookie header: this specific header can easily have multiple encodings, as the browser encodes data then treats it as opaque bytes, so a cookie can be set via UTF-8 one place, Latin1 another, and those coexist in one header. That is, there is no real encoding and this should be treated as bytes. (Latin1 is an approximation of bytes... a spotty way to treat bytes, but entirely workable.) response status: I believe the spec says this must be Latin1/ISO-8859-1. In practice it is almost always ASCII, and since it is not user-visible it's not something that really needs localization. response headers: the spec implies Latin1, in practice the Set-Cookie header is bytes (since interoperation with wonky legacy systems is not uncommon). I'm not sure of any other exceptions? So... to me it seems pretty reasonable for HTTP specifically that text can work. And if feels weird that, say, environ['SERVER_NAME'] be text and environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR'] should be in that mode. And it would also be weird if environ['SERVER_NAME'] was bytes. In the past when we've gotten down to specifics, the only holdup has been SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com wrote: On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote: In the past when we've gotten down to specifics, the only holdup has been SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those. I think I favor PJE's suggestion: let WSGI deal only in bytes. I'd prefer that WSGI 2 was defined in terms of a bytes with benefits type (Python 2's ``str`` with an optional encoding attribute as a hint for cast to unicode str) instead of Python 3-style bytes. But if I had to make the Hobson's choice between Python 3 style bytes and Python 3 style str, I'd choose bytes. If I then needed to write middleware or applications, I'd use WebOb or an equivalent library to enable a policy which converted those bytes to strings on my behalf. Making it easy to write raw middleware or applications without using such a library doesn't seem as compelling a goal as being able to easily write one which allowed me direct control at the raw level. What are the concrete problems you envision with text request headers, text (URL-quoted) path, and text response status and headers? -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 5:06 PM, Ian Bicking i...@colorstudy.com wrote: On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver tsea...@palladion.comwrote: Basically all the internal strings are textish, so we're left with: What do you mean by internal? Anything in the headers or the CGI environment is intrinsically bytes-ish to me. Do you mean that you want application programmers to have them transparently decoded? If so, we can make that the responsibility of the non-middleware framework / application. By internal I mean all the CGI variables that aren't representing HTTP, like SERVER_NAME. Actually I was thinking SERVER_SOFTWARE, though SERVER_NAME is somewhat similar as it doesn't come from HTTP, it comes from server configuration. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 8:46 PM, Ian Bicking i...@colorstudy.com wrote: So... before jumping to conclusions, what's the hard part with using text? Oh, the one thing that will be silly is cookies, but they are totally nuts already. They can be parsed equally well as bytes or latin1, and best only transcoded after parsing. Doing cookie_value.decode(app_encoding) or cookie_value.encode('ISO-8859-1').decode(app_encoding) isn't terribly different. And cookies aren't fair because they are just stupid; like the standard library I don't think we should design anything around their idiosyncrasies. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: Nah, not nearly that hard: path_info = urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') I don't see the problem? If you want to distinguish %2f from /, then you'll do it slightly differently, like: path_parts = [ urllib.parse.unquote_to_bytes(p).decode('UTF-8') for p in environ['wsgi.raw_path_info'].split('/')] This second recipe is impossible to do currently with WSGI. So... before jumping to conclusions, what's the hard part with using Sorry, it is not that simple. The thing that everyone is ignoring is that SCRIPT_NAME and PATH_INFO are also normalized by the web server normally. That is, .. instances are removed. By passing the raw URL through to the application, you are now forcing every application to have to deal with that as well with the possibility of directory traversal attacks when people get it wrong and the URL is mapping somehow to file system resources. It is a huge can of worms which at the moment the web server deals with. Well... at least to me raw only means not URL decoded, so it doesn't necessarily mean you can't clean up the request path. I guess an attacker could encode . to make things harder. Nevertheless, WSGI servers don't currently guarantee this cleaning. I added it to paste.httpserver, but I don't know one way or the other about any other servers. A quick test shows wsgiref does not clean paths. So apps shouldn't rely on a clean path. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions (that represent the original request URI). We use Latin1 encoding, but it should be ASCII anyway, like most of the headers. BTW, it should be highlighted whether this change is relevant to Python 3 but like some of the other things you relegated as out of scope, purely a wish list item. Certainly; most headers or metadata is pretty much constrained to ASCII, and any use of non-ASCII is... at least peculiar, and presumably application-specific. For instance, there's no reason you'd have anything but ASCII in Cache-Control. The one place encoded information happens regularly in headers (that I know of) is Cookie. The request URI path is generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI path, they are URL decoded versions of the request URI path. And they are usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is problematic (though we could define that they must be decoded with surrogateescape). And while they are usually UTF8, they are sometimes no valid encoding at all, because anyone can assemble any set of characters they want and web browsers will accept it. By avoiding URL-unquoting of these values, we can also stick to Latin1 and get something reasonable. It's not very attractive to me that we take something that is probably *not* Latin1, and may reasonably not be ASCII, and decode it as Latin1. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] http://wiki.python.org/moin/WebFrameworks
Personally, if the author/maintainer of any library claims it is maintained/up-to-date, I say trust them. Most people are pretty honest about the status of their projects. But it does require a positive response to really make this claim. On Sat, Nov 28, 2009 at 12:03 PM, Aaron Watters arw1...@yahoo.com wrote: On Thu, Nov 26, 2009 at 1:02 PM, Chris McDonough chr...@plope.com wrote: http://wiki.python.org/moin/WebFrameworks seems to be the place where folks are registering their respective web frameworks. I'd like to move some of the frameworks which are currently in the various categories which haven't been active in a few years. In particular, I'd like to move any framework which hasn't had a release since the beginning of 2008 (arbitrary) into the Discontinued / Inactive framework category. I'd be willing to do the work to make sure I wasn't moving one that actually *did* have releases past that but just hadn't updated the page. Any dissent? - C Why not call them apparently stable versus under active development? Is the cgi module discontinued? I'm a little sensitive on this topic because people tell me that Gadfly is inactive or discontinued but it still does what it does as documented very well. Frequent releases may actually be a sign of bugginess and bad design. If you suspect a project is really dead, maybe you could try to contact the authors and ask about what they think. -- Aaron Watters === BTW, I think Release early, release often is nonsense because it means you are probably releasing something buggy and unstable which will just alienate your users, who will never come back to see the better version. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec
On Fri, Nov 27, 2009 at 12:20 PM, P.J. Eby p...@telecommunity.com wrote: 1. The 'readline()' function of 'wsgi.input' may optionally take a size hint. Already de-facto required. Leaving it out helps no-one. KEEP. Fair enough, since it's a MAY. On the other hand, because it's a MAY, it actually *helps* no-one, from a spec compatibility POV. (That is, you have to test whether it's available, so it's no different than it not being in the spec to begin with.) So, putting it in doesn't *hurt*, but neither does it *help*... so I lean towards leaving it to 2.x, where it can actually help. I think it was meant to be a must. The *caller* MAY pass in a size hint, the implementor MUST implement this optional argument. This is the de-facto requirement. 2. The 'wsgi.input' must provide an empty string as end of input stream marker. I don't think this will be a problem. What would WSGI middleware do to break this requirement? It could be reading the original input stream, and replacing it with another one. Not very common I would guess, but it's still possible for a piece of perfectly valid 1.0 middleware to fail this requirement for 1.1, leading to the condition where you really can't tell if you're running valid 1.1 or not. Middleware sometimes does this, but any time it does this it always replaces the input stream with something truly file-like, e.g., StringIO or a temp file. Nothing but servers really hands sockets around, and sockets are the only objects I'm aware of that don't act quite like a file. It was only put in in the first place so that CGI adapters could pass through their input stream (which may not ever provide an EOF) without having to wrap it. I agree that was a mistake, and should be corrected. I agree... but only in 2.x. 3. The size argument to 'read()' function of 'wsgi.input' would be optional and if not supplied the function would return all available request content. Thus would make 'wsgi.input' more file like as the WSGI specification suggests it is, but isn't really per original definition. This one could be a problem with middleware, and that feature shouldn't ever be used, in any case: reading into memory an arbitrary amount of data from a client is not a good thing to encourage. OMIT. Agreed -- even in 2.x it's questionable if not harmful. Well, we need a way to handle content of unknown length, but if the file terminates with '' then this isn't that important. 4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must honour the Content-Length response header and must only return from the file that amount of content. This would guarantee that using wsgi.file_wrapper to return part of a file for byte range requests would work. Given item #6, I suppose this is actually just a matter of efficiency, in case the file wrapper is sent to a middleware rather than directly to the wsgi gateway? If it goes directly to the gateway, that can of course stop reading by itself. ?undecided? I don't really see how this one helps anything in 1.x, and so lean towards leaving it out. I don't really understand this either, unless it was handling range responses as well. Content-Length alone isn't very interesting in this case. 5. Any WSGI application or middleware should not return more data than specified by the Content-Length response header if defined. As long as this is meant as SHOULD, that's fine. It's not actually a requirement, but rather a suggestion of best practices. KEEP. 6. The WSGI adapter must not pass on to the server any data above what the Content-Length response header defines if supplied. This is already required by HTTP. If the WSGI gateway doesn't make this happen somehow, it's generating invalid HTTP and that's a bug. Okay to clarify in the spec to ensure people don't miss the requirement when implementing. KEEP. Good points - I agree with these two, and they can be considered 1.0 clarifications as well. After the first four (which I see no reason to include) I was probably a little over-inclined to throw these two out (especially since I was reading the should above as a must, per your proposal). In this context, maybe 4 is just an extension of these? Put 4 after 6 and maybe it'll seem more obvious...? -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Future of WSGI
On Wed, Nov 25, 2009 at 2:03 PM, Tres Seaver tsea...@palladion.com wrote: Aaron Watters wrote: --- On Wed, 11/25/09, Chris Dent chris.d...@gmail.com wrote: From: Chris Dent chris.d...@gmail.com I can (barely) relate to some of the complaints that start_response is a pain in the ass, but environ, to me, is not broken. I agree. It maps nicely onto the underlying protocol and WSGI is supposed to be low level right? The biggest problem with start_response is that after you evaluate iterable = application(env, start_response) Sometimes the start_response has been called and sometimes it hasn't, and this can break middlewares when they haven't been tested both ways (repose.who for example seems to assume it has been called). Since version 1.0.13 (2009-04-24), repoze.who's middleware is very careful to dance around the fact that an application is not required to have called 'start_response' on return, but *must* call it before returning the first chunk from its iterator. That bit of flexibility in PEP 333 is likely there to support *some* use case, but it makes 'start_response' a *big* pain to work with in middleware which needs to to egress processing of headers. Just in terms of history, I think I'm to blame on this one, as I argued quite vigorously for start_response. The reason being that at the time frameworks that had a concept of streaming usually did it by writing to the response. While the names were different depending on the framework, this was the common way to do streaming: def file_app(req): filename = ... req.response.setHeader('Content-Type', mimetypes.guess_type(os.path.splitext(filename)[1])[0]) # I believe most did not stream by default... req.response.stream() fp = open(filename, 'rb') while 1: chunk = fp.read(4096) if not chunk: break req.response.write(chunk) To support that style of streaming start_response was added. I think PJE also had some notion of Comet-style interactions, and maybe something related to async, leading to the specific restrictions on how written content should be handled. I still don't entirely understand the use case underlying that. But anyway, that's some of the motivation. start_response is still useful for retrofitting support for frameworks from time to time, but all the modern frameworks work differently these days making start_response seem less necessary. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Future of WSGI
On Tue, Nov 24, 2009 at 3:28 PM, Malthe Borch mbo...@gmail.com wrote: The proposal that seemed to work best was to keep the environ as str (i.e., unicode in Python 3), and eliminate the problematic SCRIPT_NAME and PATH_INFO, replacing them with url-encoded values. Also I think everyone is okay with removing start_response. All text would be decoded as latin1 on Python 3 (which allows for transcoding; also most text is not unicode). The request and response body would remain bytes. I assume with all text you mean all header text, e.g. all header values. All the things that are specified to be str, would stay str in Python 3. This includes all keys, headers, and stuff like wsgi.url_scheme. Can we talk briefly then about wsgi.*? I think we should eliminate them and in their place put a real request object, something very basic that has only what's absolutely necessary to communicate the essential data from the low-level HTTP request. There is no way that the environment can express an HTTP request. This was a mistake in my view and we should rectify it either in 1.1 or 2.0. I'm not aware of any problems with representing the request with a dictionary. Can you give examples? -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Future of WSGI
On Tue, Nov 24, 2009 at 3:40 PM, Malthe Borch mbo...@gmail.com wrote: I'm not aware of any problems with representing the request with a dictionary. Can you give examples? The body stream is not part of the HTTP environment. It's an abuse and it has the very negative effect of luring developers into further abuse. You mean specifically environ['wsgi.input'] ? While the file-like interface is difficult, other possible interfaces aren't so great either. As to putting the request body in the environment, I don't know what the problem is? Or are you just concerned that people put arbitrary things in the environ? There's far too many important use cases that are satisfied by the extensible nature of the environ to give it up just because some people believe it is overused. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Future of WSGI
On Tue, Nov 24, 2009 at 4:16 PM, Sylvain Hellegouarch s...@defuze.org wrote: I'm not aware of any problems with representing the request with a dictionary. Can you give examples? Though it shouldn't be considered as a problem, the fact that probably no existing framework actually use the raw dictionary (there is, in almost all cases, a wrapping into a friendlier object), one might wonder why keeping such a low level interface rather than directly provide a higher level interface is a good idea. After all creating those dictionaries for no good reason aside from sending them to the next layer which will map them into a WebOb, a yaro, a cherrypy request, or zope request, etc. seems slightly pointless (I'm not versed into Python internals, but doesn't it have also a cost of creating rather useless objects repeatedly like that?) I know WSGI tries hard not to force into one implementation but still... Well, that's hardly a trivial requirement, nor a trivial accomplishment. Also the dictionary is a complete and inspectable representation of the environment, divorced from any possible trickery on the part of frameworks. It's a common gateway between servers and frameworks, and can be used as a gateway between middleware and applications. And it's really fairly common for middleware to use the raw dictionary without any object involved. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] WebOb API
Hi all. So, it's about time that WebOb came to 1.0. For 1.0 I'd like to settle the API as much as possible. But I'd also like to move further to getting WebOb used for more frameworks. I don't expect that to happen before 1.0, but if there are API changes that will make that easier later, then maybe we can get those in. While I haven't tracked ongoing changes to frameworks, I did put together the differences I am aware of in APIs here: http://pythonpaste.org/webob/differences.html Some of them are fairly trivial, and could be managed through subclassing (e.g., req.raw_post_data vs. req.body -- semantically identical, just different names). Are there API changes that would help people consider WebOb for other frameworks? The main ones I can think of is req.FILES, separating out file uploads from other POST fields. Also then there's the issue of what kind of object represents files. The finer details of individual objects are also important, things like the API of req.GET/req.POST (which are views on ordered dictionaries, and are represented somewhat differently in different frameworks). Also I'm planning on introducing a BaseRequest (and *maybe* BaseResponse) class, that removes some functionality. Specifically for Repoze they'd like to remove __getattr__ and __setattr__ (which has some performance implications), and maybe other things are possible (though removing writers is infeasible, IMHO, as read and write access are not easily separated, and it would require too much code duplication). (Incidentally WebOb is now on bitbucket: http://bitbucket.org/ianb/webob/) -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info
Thanks for the test case; fixed in tip now. If anything goes wrong what should happen is a return value of (quote(script_name), quote(path_info)) -- there's no combination of request_uri/script_name/path_info that should cause an exception (except bugs). As you say, there's no promise that those values are in any way related, and when that is the case it is appropriate to fix it up at the WSGI stage (not necessarily in the WSGI adapter itself). On Mon, Sep 28, 2009 at 2:34 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: 2009/9/28 Ian Bicking i...@colorstudy.com: I tried implementing some code to convert REQUEST_URI (the raw request URL) and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info. http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2) http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3) Admittedly the tests are not very complete, I just wasn't feeling creative about test cases. In terms of performance this avoids being entirely brute force, but feels kind of complex. I'm betting there's an entirely different approach which is faster. But whatever. Got an error: mod_wsgi (pid=4301): Exception occurred processing WSGI script '/Users/grahamd/Testing/tests/wsgi20.wsgi'. Traceback (most recent call last): File /Users/grahamd/Testing/tests/wsgi20.wsgi, line 80, in application environ['PATH_INFO']) File /Users/grahamd/Testing/tests/wsgi20.wsgi, line 64, in request_uri_to_path remove_segments = remove_segments - 1 - qscript_name_parts[-1].lower().count('%2f') IndexError: list index out of range This was an extreme corner case where Apache mod_rewrite was being used to do stuff: RewriteEngine On RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ /wsgi20.wsgi/$1 [QSA,PT,L] and Apache was configured to allow encoded slashes. The input would have been: REQUEST_URI: '/a%2fb/c/d' SCRIPT_NAME: '/wsgi20.wsgi' PATH_INFO: '/a/b/c/d' That style of rewrite rule is quite often used with Apache, although allowing encoded slashes isn't. That SCRIPT_NAME needs to be adjusted is a known consideration with this rewrite rule. Usually you would use wrapper around WSGI application which does: def _application(environ, start_response): # The original application. ... import posixpath def application(environ, start_response): # Wrapper to set SCRIPT_NAME to actual mount point. environ['SCRIPT_NAME'] = posixpath.dirname(environ['SCRIPT_NAME']) if environ['SCRIPT_NAME'] == '/': environ['SCRIPT_NAME'] = '' return _application(environ, start_response) If that algorithm is used in WSGI adapter however, would never get the opportunity to do that though as would already have failed before it got called. Graham -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info
I tried implementing some code to convert REQUEST_URI (the raw request URL) and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info. http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2) http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3) Admittedly the tests are not very complete, I just wasn't feeling creative about test cases. In terms of performance this avoids being entirely brute force, but feels kind of complex. I'm betting there's an entirely different approach which is faster. But whatever. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
It's not a specific proposal, but here's my opinions on what a proposal should be: On Tue, Sep 22, 2009 at 1:06 AM, Mark Nottingham m...@mnot.net wrote: OK, that's quite exhaustive. For the benefit of those of us jumping in, could you summarise your proposal in something like the following manner: 1. How the request method is made available to WSGI applications Graham talked about it as bytes/unicode/native, where native is unicode on Python 3 and str on Python 2. For instance, I think there's general consensus (though not really specifically discussed) that environ keys should be native. I think method should be native. 2. How the request-uri is made available to WSGI applications -- in particular, whether any decoding of punycode and/or %-escapes happens Hah, didn't even think about de-punycoding HTTP_HOST. That'd be a blast. I think: * scheme as native * HTTP_HOST as native (no decoding of punycode) * path as native (no URL decoding) - big break with WSGI 1 and CGI, but what the hell. I could easily waffle on this. * query string as native - *should* be ASCII-safe currently. Wow, that was easy! Request headers, which you didn't split out... those I'm not sure. I'd *like* them to be native. But damn, I'm just not sure quite how. surrogateescape? Latin1? Latin1 as a kind of poor man's surrogateescape isn't so bad. And the headers *should* be ASCII for sane requests, so it's not a horrible compromise. I guess libraries could lazilly transcode, just like they currently lazily decode. But it'd be a bit obnoxious at the library level. Transcoding middleware would be easier, but it adds the question of how to record that the transcoding has taken place. 3. How request headers are made available to WSGI apps Request handlers? I don't understand your terminology. 4. How the request body is made available to to WSGI apps Ugh. wsgi.input could remain. I think at least it should become a file-like interface (i.e., giving an empty string when the content is exausted) and I might even ask that it implement .tell() (.seek() would be nice of course, but optional). If there was some other idea, I think there's room for improvement on wsgi.input and the file interface. wsgi.input should definitely work with bytes only. I believe this is consensus. 5. Likewise for how apps should expose the response status message, headers and body to WSGI implementations. I believe there is consensus that the response body should remain an iterator that yields bytes. In one way, it'd be nice if we'd just say that status/headers should be ASCII, because that's the reasonable choice. But for proxying or representing HTTP as it is, it's not always the case. And I'm committed to keeping WSGI fully capable of representing arbitrary requests and responses so long as they aren't entirely diabololical. But, an ASCII status is not unreasonable, especially since there's zero semantic meaning to the reason. Which makes native strings perfectly fine. So, headers... Well, Latin1 is easy enough. In theory, or at least particular theories, headers can be Latin1. And you can represent arbitrary bytes that way. So if you want to send crazy stuff to the browser, you can do it that way. And if you want to stick to plain ASCII then that's easy enough as well. So... native? str or unicode? I'm not sure specifically for this one. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
On Tue, Sep 22, 2009 at 3:16 AM, Armin Ronacher armin.ronac...@active-4.com wrote: Hi, Ian Bicking schrieb: Request headers, which you didn't split out... those I'm not sure. I'd *like* them to be native. But damn, I'm just not sure quite how. surrogateescape? Latin1? Latin1 as a kind of poor man's surrogateescape isn't so bad. And the headers *should* be ASCII for sane requests, so it's not a horrible compromise. Except for cookie headers. Thanks to advertising and all the other system putting headers on your page you can't even properly control that one. Yes, but it'd be relatively easy to handle this, especially since the raw header isn't very useful. So you just do environ['HTTP_COOKIE'].encode('latin1').decode('utf8', 'replace') before parsing. Another thing to consider: in Python 3.1, the HTTP server internally decodes to latin1 and there is no simple way to change that, unless you replace the implementation. Ugh. wsgi.input could remain. I think at least it should become a file-like interface (i.e., giving an empty string when the content is exausted) and I might even ask that it implement .tell() (.seek() would be nice of course, but optional). If there was some other idea, I think there's room for improvement on wsgi.input and the file interface. -1 on seek and tell. This could be impossible to implement and what we really want to do is to not have the data in memory but on disk or whereever you put big-ass uploads. Also it will be hard to test for an avaiable seek or not, because even if it's a noop, the method could be there. Tell doesn't have particular overhead except to keep track of how many bytes have been read. That would allow libraries to at least detect contention for wsgi.input. I wish seek were detectable, though I agree it shouldn't be required at all. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO
OK, I mentioned this in the last thread, but... I can't keep up with all this discussion, and I bet you can't either. So, here's a rough proposal for WSGI and unicode: I propose we switch primarily to native strings: str on both Python 2 and 3. Specifically: environ keys: native environ CGI values: native wsgi.* (that is text): native response status: native response headers: native wsgi.input remains byte-oriented, as does the response app_iter. I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead we have: wsgi.script_name wsgi.path_info (I'm not entirely set on these names) These both form the original path. It is not URL decoded, so it should be ASCII. (I believe non-ASCII could be rejected by the server, with Bad Request? A server could also choose to treat it as UTF8 or Latin1 and encode unsafe characters to make it ASCII) Thus to re-form the URL, you do: environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] + environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' + environ['QUERY_STRING'] All incoming headers will be treated as Latin1. If an application suspects another encoding, it is up to the application to transcode the header into another encoding. The transcoded value should not be put into the environ. In most cases headers should be ASCII, and Latin1 is simply a fallback that allows all bytes to be represented in both Python 2 and 3. Similarly all outgoing headers will be Latin1. Thus if you (against good sense) decide to put UTF8 into a cookie, you can do: headers.append(('Set-Cookie', unicode_text.encode('UTF8').decode('latin1'))) The server will then decode the text as latin1, sending the UTF8 bytes. This is lame, but non-ASCII in headers is lame. It would be preferable to do: headers.append(('Set-Cookie', urllib.quote(unicode_text.encode('UTF8' This sends different text, but is highly preferable. If you wanted to parse a cookie that was set as UTF8, you'd do: parse_cookie(environ['HTTP_COOKIE'].encode('latin1').decode('utf8')) Again, it would be better to do; parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8')) Other variables like environ['wsgi.url_scheme'], environ['CONTENT_TYPE'], etc, will be native strings. A Python 3 hello work app will then look like: def hello_world(environ): return ('200 OK', [('Content-type', 'text/html; charset=utf8')], ['Hello World!'.encode('utf8')]) start_response and changes to wsgi.input are incidental to what I'm proposing here (except that wsgi.input will be bytes); we can decide about themseparately. Outstanding issues: Well, the biggie: is it right to use native strings for the environ values, and response status/headers? Specifically, tricks like the latin1 transcoding won't work in Python 2, but will in Python 3. Is this weird? Or just something you have to think about when using the two Python versions? What happens if you give unicode text in the response headers that cannot be encoded as Latin1? Should some things specifically be ASCII? E.g., status. Should some things be unicode on Python 2? Is there a common case here that would be inefficient? -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
On Sun, Sep 20, 2009 at 8:06 AM, Armin Ronacher armin.ronac...@active-4.com wrote: Thanks to Graham Dumpleton and Robert Brewer there is some serious progress on WSGI currently. I proposed a roadmap with some PEP changes now that need some input. Summary: WSGI 1.0 stays the same as PEP 0333 currently is WSGI 1.1 becomes what Ian and I added to PEP 0333 WSGI 2.0 becomes a unicode powered version of WSGI 1.1 WSGI 3.0 becomes WSGI 2.0 just without start_response WSGI 1.0 and 1.1 are byte based and nearly impossible to use on Python 3 because of changes in the standard library that no longer work with a byte-only approach. 1.1 I think of as an errata on 1.0, so... simple enough. I was skeptical about a unicode version of WSGI, but I think I'm okay with it now. For people who use UTF-8-only it should be fairly simple and easy; for people who want to deal with other encodings, backward compatible URLs, or other weirdness I think surrogateescape can resolve the small handful of problems. Maybe an option to use latin1 (at the server level) would do the same for Python 2, as a deployment option for people who are dealing with these tricky issues. Which is kind of lame, but it means everything is still *possible*, and the use cases are somewhat obscure. Especially because QUERY_STRING and wsgi.input remain bytes. (Well, I guess the other case would be someone reading a cookie set by an application they do not control, and set in a crazy way... but anyway, there's a handful of use cases where things get tricky, but we can kind of punt, or try to implement the necessary transcoding routines before the spec is final.) I'm very much opposed to a second raw version of the request, as I do not like redundancy. With respect to 3.0/start_response, I'd rather we just do both at once, so there's not so many versions of WSGI to worry about. Also it doesn't feel like a very difficult change to make. The only other major issue is wsgi.input, which is a quite awkward interface to the request body. But I think resolving that is harder than start_response, in particular because there's no clear solution. Maybe at least switching to a file interface would be better. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
On Mon, Sep 21, 2009 at 6:16 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: Of course you can directly use `environ['some_key']` if you know you'll get the 'right' encoding all the time. But when the encoding changes, you'll have to fix all your middlewares. I am missing something? For one, we aren't talking about arbitrary keys needing this treatment. We are only talking about SCRIPT_NAME and PATH_INFO. OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and introduce two equivalent variables that hold the NOT url-decoded values. So if you request /fran%e7cois then environ['PATH_INFO_RAW'] is '/fran%e7cois'. This will be quite disruptive, as these are variables that are frequently accessed directly (libraries that expose them as attributes can just turn them into properties that do URL decoding, using UTF8). But it's an easy fix at least. I would actually want to specify that if we added this key, we should disallow the old keys -- terrible confusion could ensue from both in the environ. This also fixes the problem with not being able to distinguish %2F from /, which isn't a big problem but is annoying, and is hiding meaningful information. (I believe the relevant spec does distinguish between these two values -- i.e., ideally decoding should happen on path segments, each segment separated by a real /.) If we do that, then the only really tricky thing left is HTTP_COOKIE, and since the Cookie header is a mess then HTTP_COOKIE will be a mess and we just have to figure out a hacky way to deal with that. Maybe surrogateescape, but probably just Latin1 would be fine (and easy to do in Python 2). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
On Tue, Sep 22, 2009 at 12:21 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: That may be fine for pure Python web servers where you control the split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first place but don't have that luxury in Apache or via FASTCGI/SCGI/CGI etc as that is done by the web server. Also, as pointed out in my blog, because of rewrites in web server, it may be difficult to try and map SCRIPT_NAME and PATH_INFO back into REQUEST_URI provided to try and reclaim original characters. There is also the problem that often FASTCGI totally stuffs up SCRIPT_NAME/PATH_INFO split anyway and manual overrides needed to tweak them. When things get messed up I recommend people use a middleware (paste.deploy.config.PrefixMiddleware, though I don't really care what they use) to fix up the request to be correct. Pulling it from REQUEST_URI would be fine. Also, at worst, you can do environ['SCRIPT_NAME_RAW'] = urllib.quote(environ.pop('SCRIPT_NAME')). It sucks, but if that's all the information you have, then that's all the information you have. Or try to get the information from REQUEST_URI the hard way, once at the gateway level. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
On Tue, Sep 22, 2009 at 12:38 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: If doing something like you suggest, would prefer them as 'wsgi.' prefixed variables and not put in all upper case namespace to be confused with CGI variables etc. I just had to make up a name, but I agree with your suggestion for wsgi.X (we already have wsgi.url_scheme, after all). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
I can't read all this thread carefully, too much stuff. I will note however that people are STILL ignoring surrogateescape (http://www.python.org/dev/peps/pep-0383/). This is like the third or fourth time I've brought it up. It was added to Python 3.1 for some of the exact issues we are encountering. Particularly, imagine someone requests /foo%efbar (which is not valid UTF-8). SCRIPT_NAME = b'/foo\xefbar' # after url unquoting (urllib.request.unquote doesn't work for this currently) s = SCRIPT_NAME.decode('utf8', 'surrogateescape') s '/foo\udcefbar' s.encode('utf8', 'surrogateescape') b'/foo\xefbar' So we can have unicode values that can be safely and correctly transcoded to other encodings (or handled in their raw form). The constraints on surrogateescape are: * You have to use 'surrogateescape' during decoding and encoding (I think for decoding it should be part of the spec) * You have to know the encoding; doing s.encode('latin1', 'surrogateescape') wouldn't necessarily preserve the correct bytes (it does for this example, but wouldn't if there was a mix of valid UTF-8 and invalid bytes) And there's a bit of an annoyance to the fact that SCRIPT_NAME/PATH_INFO should always be treated as UTF-8 (which might sometimes be wrong, but for any modern app/browser will be right), but maybe other parts (HTTP_COOKIE?) are in native encoding. Well, besides HTTP_COOKIE, I don't know what else would be in a different encoding. Atompub adds Slug, but it's a URL/IRI, so it should be ASCII. I have seen proposals for a Title header (e.g., when PUTting an image and giving it a title), and that could be unicode. But in all those cases it'll be a modern app and modern clients, and in those cases people just use UTF-8. Frankly I'm open to UTF-8-everywhere. People mentioned Jack and Rack, and to what degree that works, it probably works because everyone uses UTF-8. With surrogateescape we allow transcoding when needed (e.g., if you wanted to handle redirects from old/weird non-UTF-8 URLs) but keep things reasonably simple otherwise. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] String Types in WSGI [Graham's WSGI for py3]
On Fri, Sep 18, 2009 at 2:56 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: As others have pointed out, the likes of rack and jack, not sure about the new Perl variant, don't seem to have an issue with using unicode. I looked up Jack and Rack: http://jackjs.org/jsgi-spec.html and http://rack.rubyforge.org/doc/files/SPEC.html They don't have an issue with unicode because they don't mention it and don't specify anything at all. Basically they punt on the issue. In the specific case, most things in Javascript have to be unicode. The response body iterator must have items that respond to toByteString, which includes String and Binary. I'm assuming Strings always use UTF8 in Javascript, as JSON acts that way. jsgi.input is only specified as an input stream, which is very unspecified. Especially since jsgi.errors is an output stream, though presumably one should be binary and the other text. Ruby's unicode is kind of funny (as I understand it), in a way that might help them. Strings are stored as binary with an attached encoding. So there's no unicode, only binary strings with encodings; so you can change the encoding, or transcoding happens implicitly when you combine strings from different encodings. So basically there's no mention of unicode because they've dodged that whole bullet. But it also seems to be unspecified what encoding might be attached to strings, if any at all. Another example, neither spec even indicates if SCRIPT_NAME/PATH_INFO are url-decoded (or that they aren't decoded). So, in summary: I don't see anything we can learn from these specs, and there's no reason we should feel like we've somehow been leapfrogged, instead these other specifications are underspecified. I also think on Web-SIG we are approaching this with more robust and general applications in mind than for Jack and Rack -- for instance, I would like WSGI to be a reasonable basis for an HTTP proxy, where you can't enforce UTF8-everywhere. If all we wanted for WSGI was to be a layer for serving monolithic applications then these issues wouldn't be so important. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets
On Fri, Sep 18, 2009 at 5:07 PM, P.J. Eby p...@telecommunity.com wrote: On his blog, Graham mentioned some skepticism about skipping WSGI 1.1 and going straight to 2.0, due to concern that people using write() would need to make major code changes to go to WSGI 2.0. I'm not entirely clear why this is such a big deal. Here's how I'd implement a WSGI 2 wrapper around a WSGI 1 app: def wsgi1to2(app): def new_app(environ): written = [] status_headers = [] def start_response(status, headers, exc_info=None): if exc_info is not None: raise exc_info[0], exc_info[1], exc_info[2] status_headers[:] = [status, headers] return written.append app_iter = app(environ, start_response) if not status_headers: app_iter = iter(app_iter) written.append(app_iter.next()) assert status_headers if written: app_iter = itertools.chain(written, app_iter) return status_headers[0], status_headers[1], app_iter What's wrong with this simpler approach to the conversion? -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets
On Fri, Sep 18, 2009 at 7:09 PM, Armin Ronacher armin.ronac...@active-4.com wrote: Ian Bicking schrieb: What's wrong with this simpler approach to the conversion? It buffers, you can no longer do this: request.write('processing data') request.flush() ... request.write('data processed') request.flush() But that's not too common and people should rather rewrite their applications to use generators for these cases. Yes -- I don't think many (any?) people use this particular technique, though many people use the start_response writer simply because it was there and it seemed like a good idea. I even used it a few times because it was easier to code for some circumstances (e.g., paste.cgiapp) but not because I expected it would immediately be pushed to the client. (appengine's webapp framework uses it a lot, not entirely sure why; not for streaming though -- maybe because it pushes the bytes out of the Python interpreter and into the parent process faster) So, I'm just saying we need to handle the start_response writer, because people have used it, but I'm not aware of people using it for its intended purpose. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI and async Servers
On Thu, Sep 17, 2009 at 11:40 AM, Armin Ronacher armin.ronac...@active-4.com wrote: Why would it be good to encourage async applications on top of WSGI? Because people would otherwise come up with their own implementations that are incompatible to each other. Maybe that should not go into WSGI but a AWSGI or whatever, but I'm pretty sure we should at least consider it and ask people that use asynchronous applications/servers what the issues with WSGI are. I think AWSGI would be most appropriate. There's too much going on, and trying to keep WSGI sane while allowing async is just too hard. If we fork, then people can get something that really works well, they can try it out with real applications, and then maybe we can look at something we know works and see if AWSGI/WSGI differences can be resolved to bring it back into one spec. And indeed it's quite possible at the library level that AWSGI could be supported by other libraries; I'm guessing for instance that WebOb would just require a few checks around the request body, and probably the response would work relatively fine (but for many patterns a normal response object would not be sufficient in an async context -- but that's fine too). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]
On Thu, Sep 17, 2009 at 10:57 AM, Armin Ronacher armin.ronac...@active-4.com wrote: I just want to point out that these are in no way final and are further intended to only clarify some of the wrong wordings for Python 2, give us a real readline() function on the input stream and get rid of useless old cruft such as Python 2.2 support and Jython compatibility which no longer appears to be a problem. To reiterate: people have complained that we've discussed non-controversial changes to WSGI, but the spec hasn't been updated. This was in large part, I think, because no one took the step going from discussion to actual proposed PEP changes. So these are some proposed changes, intended to be conservative. They are meant to be conservative, more like errata than a real revision, and to reflect current WSGI practice. If someone thinks one of the changes goes too far, then we can discuss -- I think we'll just be more constructive if we stick to concrete changes to the PEP so we can easily implement what we all agree on. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 2
On Tue, Aug 11, 2009 at 11:19 PM, Robert Brewer fuman...@aminus.org wrote: 5. When running under Python 3, servers MUST provide CGI HTTP and server variables as strings. Where such values are sourced from a byte string, be that a Python byte string or C string, they should be converted as 'UTF-8'. If a specific web server infrastructure is able to support different encodings, then the WSGI adapter MAY provide a way for a user of the WSGI adapter to customise on a global basis, or on a per value basis what encoding is used, but this is entirely optional. Note that there is no requirement to deal with RFC 2047. We're passing unicode for almost everything. REQUEST_METHOD and wsgi.url_scheme are parsed from the Request-Line, and must be ascii-decodable. So are SERVER_PROTOCOL and our custom ACTUAL_SERVER_PROTOCOL entries. The original bytes of the Request-URI are stored in REQUEST_URI. However, PATH_INFO and QUERY_STRING are parsed from it, and decoded via a configurable charset, defaulting to UTF-8. If the path cannot be decoded with that charset, ISO-8859-1 is tried. Whichever is successful is stored at environ['REQUEST_URI_ENCODING'] so middleware and apps can transcode if needed. Our origin server always sets SCRIPT_NAME to '', but if we populated it, we would make it decoded by the same charset. My understanding is that PATH_INFO *should* be UTF-8 regardless of what encoding a page might be in. At least that's what I got when testing Firefox. It might not be valid UTF-8 if it was manually constructed, but then there's little reason to think it is valid anything; only the bytes or REQUEST_URI are likely to be an accurate representation. (Frankly I wish PATH_INFO was not url-decoded, which would remove this issue entirely -- REQUEST_URI, or any url-encoded value, should really be ASCII, and I don't know of reasonable cases where this wouldn't be true.) I suppose ISO-8859-1 is a reasonable fallback in this case, as it can be used to kind of reconstruct the original request path (the surrogateescape or whatever it is called would serve the same purpose, but is only available in Python 3). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 333 and gzipping of responses
On Mon, Aug 10, 2009 at 9:11 PM, James Bennett ubernost...@gmail.comwrote: Earlier today I posted an article on my blog following up on some discussions of WSGI; one criticism presented was of language in PEP 333 regarding gzipping of responses by WSGI applications. Ian posted a comment which stated that the criticism was not correct, but I'm at a loss to figure out what *is* correct, so I'll bring up the question here. In a parenthetical at the end of the section entitled Handling the Content-Length Header, PEP 333 states: Note: applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as hop-by-hop operations, these encodings are the province of the actual web server/gateway. See Other HTTP Features below, for more details. In the section Other HTTP Features, PEP 333 states, in part: However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls hop-by-hop headers do not apply to WSGI internal communications. WSGI applications must not generate any hop-by-hop headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming hop-by-hop headers in the environ dictionary. My criticism of this is that this is at best ambiguous, and quite possibly openly misleading to readers of the PEP. The ambiguity here is that gzip is a valid value for the Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41), but is also a valid value for the Content-Encoding header (RFC 2616, Sections 3.5 and 14.11). I just don't get the confusion. Transfer-Encoding is not allowed in WSGI (a hop-by-hop header, like several other Transfer-* headers). Content-Encoding is allowed, because everything not specifically mentioned is allowed. Clearly Content-Encoding and Transfer-Encoding are different strings. And, as you mention, the normal thing that people currently do is use Content-Encoding anyway, so since people aren't using Transfer-Encoding, why is this controversial? There are some weird implications to using Content-Encoding, specifically ETags and range requests, but eh... those exist in mod_deflate and just about everywhere, and are mostly outside the scope of WSGI. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] WSGI 2
So... what about WSGI 2? Let's not completely drop the ball on this. I *think* we were largely in agreement; debate got distracted by some async stuff, but I don't think we particularly have to deal with that for WSGI 2. I think we do more than enough if we figure out: WSGI in Python 3, i.e., with unicode; some basic errata kind of stuff, like readline signature; change the callable signature to remove start_response. Would this be a new PEP or a revision? I think it should be a new PEP, as WSGI 1 remains valid and the same as it always was, and PEP 333 describes that. Is there anyone willing to make the revisions? -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3.0 and WSGI 1.0.
On Tue, May 5, 2009 at 10:14 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: 2009/5/6 Ian Bicking i...@colorstudy.com: Philip Jenvey brought this to my attention: http://www.python.org/dev/peps/pep-0383/ It's a UTF8 encoding and decoding scheme that encodes illegal bytes in such a way that you can decode to get the original bytes object, and thus transcode to another encoding. It's intended for cases exactly like WSGI. Care to explain then how that would in practice be used while I try and reread it a few times to try and understand it myself? :-) I don't particularly know, except I think you'd do things like: environ['PATH_INFO'] = urllib.unquote(http_byte_path).decode('utf8', 'python-escape') Then if the encoding was wrong, you could transcode like: environ['PATH_INFO'] = environ['PATH_INFO'].encode('utf8', 'python-escape').decode('latin1', 'python-escape') Note that you need to know the encoding that was used (utf8 in this case) and that python-escape was used. It has been suggested that the server should put the encoding it used into the environment. When transcoding this should also be updated. It's not clear what python-escape is going to do, I don't think that's been determined. Probably it'll put \x00 or something in the unicode string to mark raw bytes. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words
On Wed, Apr 8, 2009 at 1:14 PM, James Y Knight f...@fuhm.net wrote: If you want to start a discussion about having a standard parsed-header object in WSGI, that's another thing, Off topic to this discussion, but that's what WebOb is. It also largely handles the encoding issues, abstracts away the awkwardness of the WSGI call signature, and also does header parsing. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Reverse Proxy HTTPS
A last note: paste.deploy.config.PrefixMiddleware does some fixup for cases like this, including looking at X-Forwarded-Scheme and X-Forwarded-Proto for the protocol (both names, because there's nothing approaching consensus on what to name these headers). 2009/4/6 Randy Syring ra...@rcs-comp.com Graham, Excellent, thank you! That confirms for me the concept is correct, now all I have to do is work on an IIS implementation. FUN! -- Randy Syring RCS Computers Web Solutions 502-644-4776http://www.rcs-comp.com Whether, then, you eat or drink or whatever you do, do all to the glory of God. 1 Cor 10:31 Graham Dumpleton wrote: Using nginx as front end to Apache/mod_wsgi as an example: On nginx side you would use: proxy_set_header X-Url-Scheme $scheme; and on Apache/mod_wsgi side, with Django 1.0 as an example, in WSGI script file we would have: import os, sys sys.path.append('/usr/local/django') os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings' import django.core.handlers.wsgi _application = django.core.handlers.wsgi.WSGIHandler() def application(environ, start_response): environ['wsgi.url_scheme'] = environ.get('HTTP_X_URL_SCHEME', 'http') return _application(environ, start_response) Is the equivalent on IIS side as others have mentioned that you need. Graham 2009/4/7 Paweł Stradomski pstradom...@gmail.com pstradom...@gmail.com: W liście Randy Syring z dnia poniedziałek, 6 kwietnia 2009: I would like my application to have control over the HTTPS-HTTP redirects and would rather not force that logic into the forward facing web server if at all possible. That just seems like an extra configuration step that wouldn't necessarily be needed if I could figure out how to pass SSL status from the forward facing web server to the backend proxy (i.e. CherryPy and my app). So, do you (or anyone else) know of a good way to to this? Or, does everyone just assume that it is all or nothing for SSL when you are proxying to a backend? Check with IIS manual, it should be possible to set some nonstandard header when the connection goes through SSL, and then check this header in your application. Maybe that header is already there - write a simple controller that prints all the headers from the request and check how it looks with and without SSL (but verify with the IIS manual anyway). -- Paweł Stradomski ___ Web-SIG mailing listweb-...@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com ___ Web-SIG mailing listweb-...@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] thoughts on an iterator
2009/3/28 Robert Brewer fuman...@aminus.org: H. Graham brought up chunked requests which I don't think have much bearing on this issue--the server/app can't rely on the client-specified chunk sizes either way (or you enable a Denial of Service attack). I don't see much difference between the file approach and the iterator approach, other than moving the read chunk size from the app (or more likely, the cgi module) to the server. That may be what kills this proposal: cgi.FieldStorage expects a file pointer and I doubt we want to either rewrite the entire cgi module to support iterators, or re-package the iterator up as a file. There are some alternate implementations of the cgi POST-parsing functionality, some of which might be more amenable to using an iterator. Or for that matter, none of us have probably read the cgi module with this in mind. With a quick look, it'll be slightly tricky because it uses .readline a lot, but there's just not that much code involved so it can't be too hard. For clarity, I think everyone has been discussing an *iterator*, not an iterable; an iterable would have a lot of unnecessary overhead, but I've seen both terms used. I don't agree with Graham's objection, as I think the reason to read specific-sized chunks is that you don't want to read too much data into memory at one time. But the server is free to chunk the iterator to avoid too much data, and once the strings are in memory the consumer really isn't any better off reading a smaller chunk than what is available. This also means I can stop making up entirely random chunk sizes in applications. Applications have no real information to inform this chunking. If the string is already in memory, the chunking actually is counterproductive. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] how to test hunging socket ?
If you use the Paste HTTP server and Python 2.5 with ctypes installed, you can install the watchthreads app: http://svn.pythonpaste.org/Paste/trunk/paste/debug/watchthreads.py that will let you see the hung threads, and get a traceback of their current position. On Fri, Jan 30, 2009 at 1:52 PM, William Dode w...@flibuste.net wrote: Hi, I've a problem with a web app wich freeze periodicaly. I monitored my app and the hang doesn't seem to occur in it. So i think the problem is before, or after, a problem of socket i imagine... It append with wsgiref.simple_server and mod_wsgi. My app is not totaly thread safe so i didn't try a lot of servers... When it freeze, i have to restart the app manualy. With mod_wsgi it freeze the whole server. It doesn't append very often so it's difficult for me to reproduce the problem. So my question is, how can i simulate hunging socket ? or how can i see where the app freeze exactly ? In python-paste server i read the ian tried to handle some case of hunging socket... thx, and sorry for my english... -- William Dodé - http://flibuste.net Informaticien Indépendant ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] how to test hunging socket ?
On Fri, Jan 30, 2009 at 3:48 PM, William Dode w...@flibuste.net wrote: Fine, i should definitely give it a try. If my app is not thread safe but respond in a decent time, can i benefit from a multithread server (for a socket problem) if i use a lock for every page like that : I use webob... If your app isn't threadsafe, you should use a multiprocess server. mod_wsgi has options for this, and flup has forking options (you'd use flup behind Apache or another server). -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiref.validate allows wsgi.input.read() with no argument
Graham Dumpleton wrote: An application that relies on the server to simulate end-of-file will be a broken application on some servers. This is not an uncommon problem. Therefore the validator tests for this case; if you want an application that actually works consistently, you shouldn't do environ['wsgi.input'].read(). The validator does not test for that case, that is what I am pointing out. The validator allows read() to be called with no argument. Ah, sorry, I wasn't paying attention... okay, then yes, I agree -- the validator should be more restrictive. -- Ian Bicking : i...@colorstudy.com : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiref.validate allows wsgi.input.read() with no argument
Graham Dumpleton wrote: Just noticed that although WSGI PEP doesn't specifically mention that argument to read() on wsgi.input is optional, wsgiref.validate allows calling read() with no argument. From wsgiref.validate: * That wsgi.input is used properly: - .read() is called with zero or one argument class InputWrapper: def read(self, *args): assert_(len(args) = 1) v = self.input.read(*args) assert_(type(v) is type()) return v Of course, the issue is still that WSGI PEP says: The server is not required to read past the client's specified Content-Length, and ***is allowed to simulate an end-of-file condition if the application attempts to read past that point***. An application that relies on the server to simulate end-of-file will be a broken application on some servers. This is not an uncommon problem. Therefore the validator tests for this case; if you want an application that actually works consistently, you shouldn't do environ['wsgi.input'].read(). -- Ian Bicking : i...@colorstudy.com : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification
Mark Ramm wrote: On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover [EMAIL PROTECTED] wrote: Ian Bicking wrote: To resolve this, let's just not pass it over this time? Totally agreed. What exactly needs to happen next? We need to propose a change to the WSGI specification. I propose, in Input and Error Streams (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we change it to have readline(hint) and expand Note 3 to include readline as well as readlines, removing Note 2. Also I suppose some sort of change note in the specification? Does this sound like a sufficient change to the spec, and are there any objections to the change? -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification
Manlio Perillo wrote: Ian Bicking ha scritto: [...] We need to propose a change to the WSGI specification. I propose, in Input and Error Streams (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we change it to have readline(hint) and expand Note 3 to include readline as well as readlines, removing Note 2. Also I suppose some sort of change note in the specification? Does this sound like a sufficient change to the spec, and are there any objections to the change? Fine for me, but of course we need to do this as: 1) Errata to WSGI 1.0 or 2) WSGI 1.1 or 3) WSGI 2.0 You can't just modify the current WSGI 1.0 spec. I'm for 2), with the other clarifications about WSGI we have discussed in the past. I'm for 1. What other clarifications were you thinking of? -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification
Graham Dumpleton wrote: 2008/11/16 Ian Bicking [EMAIL PROTECTED]: We need to make a revision to the WSGI spec to say that environ['wsgi.input'].readline takes an optional size argument. It always does in practice (except in wsgiref.validate.validator, rendering that validator useless), and is required to in practice, because everyone uses cgi.FieldStorage, and it passes in that argument. This has been brought up numerous times before. There are other things about wsgi.input that really need to be changed as well to make it more useful. When I have pushed for revised specification before I could never get enough interest in it from the people that most would perceive are the ones who oversee the PEP. Yes, this has been passed over before. To resolve this, let's just not pass it over this time? This is a relatively small change to the WSGI spec, because it represents standard practice -- this change is simply getting the spec in line with implementations. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
Andrew Clover wrote: Ian Bicking wrote: As it is (in Python 2), you should do something like environ['PATH_INFO'].decode('utf8') and it should work. See the test cases in my original post: this doesn't work universally. On WinNT platforms PATH_INFO has already gone through a decode/encode cycle which almost always irretrievably mangles the value. This is something messed up with CGI on NT, and whatever server you are using, and perhaps the CGI adapter (maybe there's a way to get the raw environment without any encoding, for example?) -- it's mostly irrelevant to WSGI itself. My understanding of this suggestion is that latin-1 is a way of representing bytes as unicode. In other words, the values will be unicode, but that will simply be a lie. Yes, that would be a sensible approach, but it is not what is actually happening in any WSGI environment I have tested. For example wsgiref.simple_server decodes using UTF-8 not 8859-1 — or would do, if it were working. (It is currently broken in 3.0rc2; I put a hack in to get it running but I'm not really sure what the current status of simple_server in 3.0 is.) As far as I know, PJE just made the suggestion about Latin-1, I don't know if anything has actually been done in wsgiref or elsewhere to implement that. Honestly I don't know if anyone is doing anything with WSGI and Python 3. A lot of what you write about has to do with CGI, which is the only place WSGI interacts with os.environ. CGI is really an aspect of the CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI spec itself. Indeed, but we naturally have to take into account implementability on CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using 8859-1 decoding — or UTF-8, which is the other sensible option given that most URIs today are UTF-8 — then there cannot be a fully-compliant CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was first getting off the ground, but IMO it's still important. This will presumably require hacks that might be system-dependent. Probably the current CGI adapter will just have to be a bit more complicated. Also, if Python is utf8-decoding the environment, we'll just have to shortcut that entirely, as you can't just undo utf8. I assume there is some way to get at the bytes in the environment, if not then that is a Python 3 bug. Personally I'm more inclined to set up a policy on the WSGI server itself with respect to the encoding, and then use real unicode characters. I think we are stuck with Unicode environ at this point, given the CGI issue. But applications do need to know about the encoding in use, because they will (typically) be generating their own links. So an optional way to get that information to the application would be advantageous. The encoding of the operating system (which presumably informs the encoding of os.environ) has nothing to do with the encoding of the web application. For the CGI adapter we simply need to find a way to ignore the system encoding. I'm now of the opinion that the best way to do this is to standardise Apache's ‘REQUEST_URI’ as an optional environ item. This header is pre-URI-decoding, containing only %-sequences and not real high bytes, so it can be decoded to Unicode using any old charset without worry. Unfortunately REQUEST_URI doesn't map directly to SCRIPT_NAME/PATH_INFO. I think it might be feasible to support an encoded version of SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, and I don't know of any particular standard to base those names on), moving from the two keys to a single REQUEST_URI is not feasible. It's not that trivial to figure out where in REQUEST_URI the SCRIPT_NAME/PATH_INFO boundary really is, as there's many ways the unencoded values could be encoded. I guess you'd probably count segments, try to catch %2f (where the segments won't match up), and then double check that the decoded REQUEST_URI matches SCRIPT_NAME+PATH_INFO. An application wanting to support Unicode URIs (or encoded slashes in URIs*) could then sniff for REQUEST_URI and use it in preference to PATH_INFO where available. This is a bit more work for the application, but it should generally be handled transparently by a library/framework and supporting PATH_INFO in a portable fashion already has warts thanks to IIS's bugs, so the situation is not much worse than it already is. I use the distinction between SCRIPT_NAME and PATH_INFO extensively. And frankly IIS is probably less relevant to most developers than CGI. Anyway, any of these bugs are things that need to be fixed in the WSGI adapter, we must not let them propagate into the specification or applications. So if IIS has problems with PATH_INFO, the WSGI adapter (be it CGI or otherwise) should be configured to fix those problems up front. And of course we get support through mod_cgi and mod_wsgi automatically, so
Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
Andrew Clover wrote: If we could reliably read the bytes the browser sends to us in the GET request that would be great, we could just decode those and be done with it. Unfortunately, that's not reliable, because: 1. thanks to an old wart in the CGI specification, %XX hex escapes are decoded before the character is put into the PATH_INFO environment variable; I don't see a problem with this? At least not a problem with respect to encoding. As it is (in Python 2), you should do something like environ['PATH_INFO'].decode('utf8') and it should work. It doesn't seem like there's any distinction between %-encoded characters and plain characters in this situation. 2. the environment variables may be stored as Unicode. (1) on its own gives us the problem of not being able to distinguish a path-separator slash from an encoded %2F; a long-known problem but not one that greatly affects most people. But combined with (2) that means some other component must choose how to decode the bytes into Unicode characters. No standard currently specifies what encoding to use, it is not typically configuarable, and it's certainly not within reach of the WSGI application. My assumption is that most applications will want to end up with UTF-8-encoded URLs; other choices are certainly possible but as we move towards IRI they become less likely. This situation previously affected only Windows users, because NT environment variables are native Unicode. However, Python 3.0 specifies all environment variable access is through a Unicode wrapper, and gives no way to control how that automatic decoding is done, leaving everyone in the same boat. WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ should be decoded from the headers using HTTP standard encodings (i.e. latin-1 + RFC 2047), but unfortunately this doesn't quite work: My understanding of this suggestion is that latin-1 is a way of representing bytes as unicode. In other words, the values will be unicode, but that will simply be a lie. So if you know you have UTF8 paths, you'd do: path_info = environ['PATH_INFO'].encode('latin-1').decode('utf8') As far as I can tell this is simply to avoid having bytes in the environment, even though bytes are an accurate representation and unicode is not. A lot of what you write about has to do with CGI, which is the only place WSGI interacts with os.environ. CGI is really an aspect of the CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI spec itself. Personally I'm more inclined to set up a policy on the WSGI server itself with respect to the encoding, and then use real unicode characters. Unfortunately that's not as flexible as bytes, as it doesn't make it very easy to sniff out the encoding in application-specific ways, or support different encodings in different parts of the server (which would be useful if, for instance, you were to proxy applications with unknown encodings). So... maybe that's not the most feasible option. But if it's not, then I'd rather stick with bytes. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] parsing of urlencoded data and Unicode
Manlio Perillo wrote: Hi. In my WSGI framework: http://hg.mperillo.ath.cx/wsgix I have, in the `http` module, the functions `parse_query_string` and `parse_simple_post_data`. The first parse the query string and return a dictionary of strings, the latter parse the application/x-www-form-urlencoded client body and return a dictionary of strings and the charset used by the client for the unicode encoding. Now, I'm thinking if these two function should instead return Unicode strings instead of plain strings. I think that Unicode strings should be returned, but I would like to know what other web frameworks do. Django seems to convert to Unicode, but the Python standard library does not (and I would like to know if changes are planned for Python 3.x). WebOb decodes to request data to str, then lazily decodes to unicode based on the request encoding. The request encoding is a bit fuzzy to calculate, which is part of why the decoding is lazy, so that the request encoding can be set or changed at any time. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Using decorators to add objects in a thread-local store..
Etienne Robillard wrote: Hi all, I'd like to have your input and comments on using decorators functions for adding extra options to the request.environ object. For instance, here's a decorator whichs adds a scoped session object into request.environ: def with_session(engine=None): Decorator function for attaching a `Session` instance as a keyword argument in `request.environ`. def decorator(view_func): def _wrapper(request, *args, **kwargs): scoped_session.set_session(engine) request.environ['_scoped_session'] = getattr(scoped_session, 'sessio You should always use a namespace, e.g., request.environ['something._scoped_session'] = ... In the context of a Pylons controller you could do it this way. Of course with just WSGI it would be better to wrap it via WSGI, which is almost equivalent to a decorator: def with_session(engine=None): def decorator(app): def engine_wsgi_app(environ, start_response): environ['...'] = ... return app(environ, start_response) return engine_wsgi_app return decorator Pylons controllers aren't *quite* WSGI applications, but instances of those controller classes are. So wrapping an individual controller with middleware requires a bit more work. return view_func(request, *args, **kwargs) return wraps(view_func)(_wrapper) return decorator Then it can be used as follows: @with_session(engine=engine): def view_blog_list(request, *args, **kwargs): # get the local session object for this # request (thread-local) sess = request.environ['_scoped_session'] # do stuff with the Session object here... ... Is this a good approach, or can this be adapted to work in multithreaded environments ? Since you are passing around arguments to functions it should be fine in threaded environments. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Alternative to threading.local, based on the stack
Phillip J. Eby wrote: Obviously plenty of people have a desire to have a place to store request-local data without passing the environment everywhere. Using threading.local is a good way to do that, unless the server is not using one thread per request. Giving people an interface to write to that doesn't specifically mention threads and is customizable by the wsgi server is what I am suggesting. Er, and how do you propose people *access* that interface rather than a specific implementation of it? Wouldn't we need to pass it in the environ, thereby rendering the whole thing even more obviously moot? :) I can't decide what the question is here. You mean, how can a greenlet request-local provider indicate that they are providing a way of getting the current request? Or, how can a consumer get access, given that it can live in any module, and the consumer presumably doesn't have an environ? I imagine from what Donovan says that there would actually be one module, requestlocal, and one implementation, and that implementation would be awesome and support greenlets and threads, and whatever else comes along (which luckily is not much else), and I guess maybe has a middleware that would register the request on entry and deregister it on exit, and consumers would do: import requestlocal def whatever(): environ = requestlocal.get_request() and we'd just all agree on this singular implementation, because I don't see any way around that. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Alternative to threading.local, based on the stack
Manlio Perillo wrote: Ian Bicking ha scritto: Manlio Perillo wrote: I'm adding web-sig in Cc. [...] I'm developing a WSGI framework with all these (and other) ideas: http://hg.mperillo.ath.cx/wsgix Its still not documented, so I have not yet made an official announcement. The main design goal is to keep the level of the interface as low level as possible. I don't like additional interfaces (like Request and Response) objects around the WSGI dictionary, and I don't like frameworks like Django that completely hides the WSGI interface. Have you tried webob? My first run as Paste avoided wrappers around those objects, but an object interface has been very helpful. I have not tried it, but I have read the code (as I have read the code of Paste). In principle I'm against using additional interface, and one of the reason I wrote wsgix is to have a prof of concept, for trying to understand if it is feasible to write a WSGI application using an alternative framework. wsgix (+ mod_wsgi for Nginx) has the same role as Paste, but I have decided to use a rather different approach. As an example, in Paste you have choosed to using config dictionary for middleware configuration, that is, you have middleware factories. I think this is a red herring. WebOb specifically doesn't do anything related to configuration or the setup of the stack. What it does do is stuff like: expires = http.format_time(0) http.generate_cookie( environ, headers, name, '', expires=expires, domain=cookie_domain(environ), path=path, max_age=0) which would be resp.delete_cookie(name) (well, cookie_domain seems to be derived from a setting, but that's mostly unrelated). This isn't a particularly substantial difference, but these small conveniences add up. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Alternative to threading.local, based on the stack
Manlio Perillo wrote: Ian Bicking ha scritto: Manlio Perillo wrote: [...] As an example, in Paste you have choosed to using config dictionary for middleware configuration, that is, you have middleware factories. I think this is a red herring. WebOb specifically doesn't do anything related to configuration or the setup of the stack. What it does do is stuff like: expires = http.format_time(0) http.generate_cookie( environ, headers, name, '', expires=expires, domain=cookie_domain(environ), path=path, max_age=0) which would be resp.delete_cookie(name) (well, cookie_domain seems to be derived from a setting, but that's mostly unrelated). This isn't a particularly substantial difference, but these small conveniences add up. As I have said, this is a personal taste, I don't like the architecture used by WebOb and prefer to directly use the environ dictionary without introducing other abstractions. This is possible, I'm writing a not simple application using wsgix. I'm still evaluating if I can reuse WebOb parsing functions (and this would be a great thing: I think that we *really* need a package with *only* low *level* parsing functions for the HTTP protocol). From what I can see, WebOb *does* not offer a low level interface for the parsers: you *have* to use the Request object. I really like multilevel architectures, instead. This was the deliberate approach of Paste, and it does have several functions for doing things similar to how you describe. As I said, I went down exactly this path, but I think WebOb solves the problem better. You can think of WebOb as a way of currying functions. All the request functions take an environ argument, curried through instantiation of webob.Request. All response functions take status/headers/app_iter, curried through webob.Response. State is never held outside the environment or the status/headers/app_iter of the response. So think of webob.Request as the module of request-parsing routines, and webob.Response as the module of response-parsing routines. (There are underlying functions for things like parsing dates, but they are only exposed through those classes.) -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Alternative to threading.local, based on the stack
Donovan Preston wrote: To throw another wrench in things, with the Paste/WebError evalexception interactive exception handler, it restores this thread-local context so you can later execute expressions in the same context. It seems to me that what is really needed here is an extension of wsgi that specifies how to get, set, and list request local storage, and for people to use that instead of the threadlocal module. Of course, for threaded servers, they will just use the threadlocal module, but for Spawning running in single-threaded cooperative mode it would use a greenlet-local implementation, and for a hypothetical Twisted server running a hypothetical asynchronous wsgi application it would just use a random request id. Well, it's really call-local, i.e., dynamic scoping. Another option would be something like attaching this dynamic scoping to the frame objects themselves, in a way that evalexception could be aware (restoring them when trying to execute code in the context of some frame) and potentially greenlets could do the same thing. It could be done in a WSGI-specific way, and that might be useful, but the general issue is applicable to more than WSGI. Generally the problems we are talking about only occur when some kind of (semi-)transparent concurrency other than threads are used. This includes greenlets, restoring a frame like in evalexception, and potentially generators with the app_iter. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] help with the implementation of a WSGI middleware
Phillip J. Eby wrote: At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote: In this case the first solution is to use this middleware as a decorator, instead of a full middleware. This is the correct way to implement non-transparent middleware; i.e., so-called middleware which is in fact an application API. See: http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html for more about this. Basically, if a piece of middleware has to be there for the application to run, it's not really middleware; it's a misnamed decorator. In the original WSGI spec, I overestimated the usefulness of adding extension APIs to the environ... or more likely, I went along with some of Ian's overenthusiasm for the idea. ;-) Extension APIs in the environ just mean you have to write your code to handle the case where the API isn't there -- in which case you might as well have used a library. Eh, personally I remain unconvinced. Or, at least, while the possibility of abuse exists, the extensibility still has many valid uses, and we're better off with it than with a more object-based system (e.g., CherryPy hooks, Django middleware, Zope's Acquisition, and arguably even Zope 3's giant-ball-of-context). Also, using a *just* library supposes robust and transparent request-local storage in a manner that works comfortably with the WSGI call stack, which like any call stack can be recursive and complex. Lacking such storage, stuffing objects in the environment is better than the alternatives. Extension APIs really only make sense if they are true *server* features, not application features; otherwise, you are better off using a library rather than middleware per se. What server features? Servers are dull. Often middleware is used to implement policy separate from the application. Libraries require another kind of abstraction, and implementing policy in libraries is, IMHO, messier than the middleware alternative for many important use cases. Also there exists no neutral ground for libraries in Python. Maybe egg entry points, but they aren't all that neutral, and aren't all that applicable either. zope.interface would like to be neutral ground, but of course is not. So multiple implementations can at least possibly congeal around a WSGI request. Also of course server is a vague term. Request in, response out, that's the minimal abstraction for HTTP, and there is no server in there. If we're talking about things that call WSGI applications, well I have a ton of those that never use sockets and you'd be hard pressed to classify them as servers. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] help with the implementation of a WSGI middleware
Phillip J. Eby wrote: I don't object to stuffing things in the environment; I object to: 1. Putting APIs in there (the API should be regular functions or objects, thanks) 2. Wrapping middleware around an app to put in APIs that it's going to have to know about anyway. Well, sometimes this occurs because you want the middleware at a different level. E.g., something like the transaction handler in repoze.tm (http://svn.repoze.org/repoze.tm/trunk/) -- you expect it to be there, and for it to put an object with a certain API in the environment, and it implements an outer transaction boundary. It's something you can put in fairly speculatively, so that some consumer can make use of it. It's also a case where objects seemingly well outside the scope of the controller/web need access to some transaction manager, and that manager's most obvious scope is for the request, and so some common means to get the current transaction manager would be nice. Anyway, arguably a good example of both an API in the environment, and an API that would be nice if you could easily access without being bound to any particular framework's convention for how to get the current request. Often middleware is used to implement policy separate from the application. And that kind of middleware is therefore (one hopes) transparent to the application. Often *some* implementation must be present. E.g., if you check REMOTE_USER you implicitly expect *something* to set REMOTE_USER. Libraries require another kind of abstraction, and implementing policy in libraries is, IMHO, messier than the middleware alternative for many important use cases. Also there exists no neutral ground for libraries in Python. Maybe egg entry points, but they aren't all that neutral, and aren't all that applicable either. zope.interface would like to be neutral ground, but of course is not. So multiple implementations can at least possibly congeal around a WSGI request. Standards for data in the environ may be a good idea. But APIs in the environ are generally *not* a good idea. Yes, generally I agree. Also of course server is a vague term. Request in, response out, that's the minimal abstraction for HTTP, and there is no server in there. If we're talking about things that call WSGI applications, Nope, I mean actual servers. Well, as I was implying, anything that calls an app is in some sense a server. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Alternative to threading.local, based on the stack
Iwan Vosloo wrote: Many web frameworks and ORM tools have the need to propagate data depending on some or other context within which a request is dealt with. Passing it all via parameters to every nook of your code is cumbersome. A lot of the frameworks use a thread local context to solve this problem. I'm assuming these are based on threading.local. (See, for example: http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) Such usage assumes that one request is served per thread. This is not necessarily the case. (Twisted would perhaps be an example, but I have not checked how the twisted people deal with the issue.) The Spawning server (http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html) would indeed get things mixed up this way, as uses greenlets to make (at least some) blocking calls async. So it would encounter this problem full-force. To throw another wrench in things, with the Paste/WebError evalexception interactive exception handler, it restores this thread-local context so you can later execute expressions in the same context. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
John Millikin wrote: On Wed, Apr 9, 2008 at 10:15 PM, Bob Ippolito [EMAIL PROTECTED] wrote: That sounds like a really bad idea, if there is an option to change the behavior it shouldn't live in module state. Would you rather have strictness controls as parameters? demjson currently has seventeen of those. Maybe we could have loads(bytes) and loads_broken(bytes, allow_trailing_comma, allow_all_whitespace, allow_comments, ...) functions, one for parsing JSON, the other for parsing garbage. There's no real way to hide or remove the complexity in parsing invalid data, so both warnings and parameters will cause the implementation to be much larger, but at least having to call warnings.filter (ignore, JSONWarning) might serve to make some users think twice. What reason is there for all the different flags? Why not just strict and loose? -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
parse (bytes_or_string) generate (obj, indent = None, ascii_only = True, encoding = 'utf-8') I strongly prefer we stick to the conventional names of dump/dumps/load/loads, for consistency with other serialization libraries already in Python. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc
Alan Kennedy wrote: [1] But it's a shame they didn't write it on WSGI: then their services could have run on the Google compute cloud ;-) Indeed. After seeing a BaseHTTPServer JSON-RPC server go up on the Python Cookbook I wrote a WSGI server and made it into a tutorial: http://pythonpaste.org/webob/jsonrpc-example.html (but it's not a maintained library -- at least I won't be maintaining it). [2] Perhaps some pythonista from Web-SIG is most appropriate to advise how JSON-RPC should move forward? After all, we're more accustomed to server-side stuff than those javascript folks ;-) Let it die? It is more complicated than necessary, when instead you could just make each function a URL of its own, and POST the arguments and get back the response, with 500 Server Error for errors. It's hard to spec that up because it's too simple. OHM (http://pythonpaste.org/ohm/) follows this model of exposing a service. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Proposed specification: developer authentication
I'm having some technical problems with wsgi.org, but once those are figured out this will be posted to http://wsgi.org/wsgi/Specifications/developer_auth I'll let the spec speak for itself, I guess. I already have a half dozen tools that could make use of this spec, and I know several other tools that could also use it (TG toolbox, repoze.profile). Comments encouraged. Or just agreement. Silence indicates lack of interest. :Title: Developer Auth :Author: Ian Bicking [EMAIL PROTECTED] :Discussions-To: Python Web-SIG web-sig@python.org :Status: Proposed :Created: 31-Mar-2008 .. contents:: Abstract Many tools can be written for a WSGI stack which should only accessible to developers. For example, an interactive debugger in response to sessions. Or a template system might display the underlying filenames that created a page. Or profiling data. In some cases there are security implications to exposing this data, in other cases it is harmless but undesirable to show this information to normal users. This specification offers a single, simple way to detect if a user should be presented with this information. Rationale - So far these tools have been controlled by configuration, e.g., ``debug = True``. This works but can be dangerous, as a deployer or developer can forget to turn off tools. Or, if it is controlled through Python code, it can be difficult to enable on a site that wasn't intended to have the tool on, e.g., if you want to debug a live site because you can't reproduce a problem in development. Also, configuration doesn't allow some people to see these development tools while hiding them from other people. A per-request and secure authentication method is more desirable. This could be implemented using application-specific authentication methods and permission levels. This is undesirable because often debugging is orthogonal to users -- you may want to debug a problem only present when a low-permission or anonymous user is visiting the site. Also it is difficult to keep application and debugging permissions coherent, which is probably why this technique is not used by any tools. Specification - Debugging tools should look for a key ``environ['x-wsgiorg.developer_user']``. This will contain some kind of user name. If it is empty or not present, then debugging tools should not activate themselves, or should not expose any information in the browser. The user name can be used in logging, but all users are considered to have the same permission level (total access). The username must be a ``str``, but its contents are not constrained (an IP address, for example, would be acceptable, or a name and email, with an embedded space). If a URL is protected except for developers, applications should simply return ``403 Forbidden``. Seamless login is not part of this specification or its goals. Some systems may be IP-controlled, for example, and no login is possible. Example This is a simple exception catcher that uses the key:: import sys, traceback class CatchExceptions(object): def __init__(self, app): self.app = app def __call__(self, environ, start_response): if not environ.get('x-wsgiorg.developer_user'): return self.app(environ, start_response) try: return self.app(environ, start_response) except: start_response('500 Server Error', [('content-type', 'text/plain')], sys.exc_info()) return [traceback.format_exc()] Here is a IP-restricted middleware that sets the key:: class IPDeveloper(object): def __init__(self, app, ips=('127.0.0.1',)): self.app = app self.ips = ips def __call__(self, environ, start_response): if environ.get('REMOTE_ADDR') in self.ips: environ['x-wsgiorg.developer_user'] = environ['REMOTE_ADDR'] return self.app(environ, start_response) Problems * With security by obscurity in mind, it might be best if login methods weren't clear. With ease of use in mind, easy logins are best. * There's no levels of access. Everyone is assumed to have complete access. (You could add another custom key if you want to share extra information between the authentication and application layer.) * This encourages people to do production deployments with debugging tools enabled. Other Possibilities --- * Configuration * Conditional middleware composition * Application login systems * Some other generalized authentication system (AuthKit, etc). Open Issues --- * Should ``401 Authorization Required`` be returned? Potentially with ``WWW-Authenticate: x-wsgiorg.developer_user``. This would signal to the middleware that a login should occur, which it may or may not ignore (it could
Re: [Web-SIG] Clarifications on Python 3.0 and WSGI.
Phillip J. Eby wrote: At 11:04 AM 3/25/2008 -0500, Ian Bicking wrote: Phillip J. Eby wrote: It says that in versions of Python where 'str is unicode' (i.e. Jython, IronPython, and Python 3000), then the specification should be read to define string as a unicode string whose characters can be expressed in latin-1. Really, adding support for bytes is the stretch here. In fact, I'd almost go so far as to say the heck with bytes support except for the response body. I could easily consider headers to be text, instead. Latin-1? How is this supposed to work at all? Latin-1 is the encoding that can allow a unicode string to losslessly encode arbitrary bytes. And that's how these things are handled (or should be handled, per the spec) in Jython and IronPython today. In any case I only said I'd *almost* go so far as to say headers are text. :) Are you proposing that we use a Latin-1 encoded string to hold bytes? Isn't that kind of a step backwards in keeping unicode and text straight? Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Are you going to convert Pylons code into Python 3000?
Graham Dumpleton wrote: Personally I believe that WSGI 1.0 should die along with Python 2.X. I believe that WSGI 2.0 should be developed to replace it and the introduction of Python 3.0 would be a great time to do that given that people are going to have to change their code anyway and that code isn't then likely to be backward compatible with Python 2.X. I don't believe it should just *die*. But I agree that this is a good time to revisit the specification. Especially since I have no idea how the change to unicode text would effect the WSGI environment. Having the environment hold bytes seems weird, but having it hold unicode is a substantial change. I don't think it will be as bad as Martijn thinks, because the libraries people use will probably have relatively few interface changes. Pylons and WebOb for instance should maintain largely the same interface (and they already expose unicode when possible). None of the changes proposed for WSGI 2 would change this. If I'm maintaining two versions of a library (one for Python 2, one for Python 3), then at least I'd like to get a little benefit out of it, and a revised WSGI would give some benefit. I think we might still need some kind of WSGI 1.1 to clarify what WSGI 1 (-like semantics) means in a Python 3.0 environment. Creating adapters from WSGI 1 to WSGI 2 should be easy enough that we could still offer some support for minimally-translated WSGI code. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse
Joe Gregorio wrote: * Thorough unit tests using unittest or doctest. Done. http://httplib2.googlecode.com/svn/trunk/httplib2test.py Many unit tests done in unittest. They fall into two categories, those that run locally and those that run against a set of URIs on the web. Is there a stdlib way of segregating those tests? All the code for the resources is also checked into subversion: http://httplib2.googlecode.com/svn/trunk/test/ I guess this is a test-related feature request: something that would be nice, and that I don't believe httplib2 specifically allows (though maybe I am unaware of it) is a clear/documented way to mock http calls. wsgi_intercept provides this in a kind of general way, and includes some httplib2 support, but direct support in httplib2 (and the stdlib) would be very nice, and I think encourage people to do better testing. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse
Joe Gregorio wrote: On Fri, Feb 22, 2008 at 2:09 PM, Ian Bicking [EMAIL PROTECTED] wrote: I guess this is a test-related feature request: something that would be nice, and that I don't believe httplib2 specifically allows (though maybe I am unaware of it) is a clear/documented way to mock http calls. wsgi_intercept provides this in a kind of general way, and includes some httplib2 support, but direct support in httplib2 (and the stdlib) would be very nice, and I think encourage people to do better testing. I have a MockHttp in another project that I use for testing code that uses httplib2, is this what you'd like to see included in httplib2 itself? http://code.google.com/p/feedvalidator/source/browse/trunk/apptestsuite/client/atompubbase/tests/mockhttp.py Yes, more or less. Only taking from files on disk is less flexible than a WSGI application, so that more general interface would also be nice (though having both would be good too). Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse
Brett Cannon wrote: which I am liking. But I figured I would ask if there is any remote chance the this SIG has plans to either merge urllib and urllib2 or come up with a new module, or something before 3.0 comes out. httplib2 is basically a replacement for urllib. I personally prefer it to urllib. I don't know how other people feel, or Joe's thoughts (the author). Somewhat ironically httplib2 has a scope that is closer to urllib than httplib. It would be nice if this naming style (x and x2) didn't persist. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Removal of Cookie in Python 3.0 OK?
Brett Cannon wrote: On Feb 3, 2008 3:41 PM, Ian Bicking [EMAIL PROTECTED] wrote: Brett Cannon wrote: As part of the standard library cleanup for Python 3.0, it has been suggested to me that the Cookie module be removed. The rationale for this is that most of the module is already deprecated and cookielib does a better job for cookie support anyway. I just wanted to see if anyone here had strong objections (along with reasons) as to why the module should be kept around in some form or another. I think most frameworks still use the Cookie module. The cookielib module is more oriented to the client side. It doesn't seem to have the same parsing functions that you'd use on the server side (though maybe they are there and just not documented because they also exist in the Cookie module). I honestly don't know. This was just something that someone proposed and I figured I would quickly look into, especially since I am trying to create a single http.cookies module. But if both modules stick around that might not work out very well having BaseCookie, SimpleCookie, and Cookie all in the same module but doing very different things. I'd actually would prefer simple parsing functions instead of the objects of the Cookie module. And the only thing I really like in the cookie module is BaseCookie; the other classes try to be clever and just manage to be distracting or annoying. If as Jim suggests the existing Cookie module was made into an installable package we could have backward compatibility in addition to a cleaner stdlib going forward. (Or we could leave cookies out of the stdlib, but this particular functionality doesn't bother me since it's fairly clear, at least now, how it should be implemented.) Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] reorg of web-related modules for Python 3K
Bill Janssen wrote: I think WSGI is a better interface than any of these. BaseHTTPServer is a reasonable basis for building a server (wsgiref.simple_server and other's use it), but the subclasses are a little funky IMHO. Giving them the name http.server makes them seem like the Right Solution, and I don't think they are. They're more like server-building tools. Yes, these classes are quite old, and have been updated only patchily over the years. I don't use them, either. But I guess the question is whether wsgiref.* is a better _implementation_ than any of these. We don't really have interfaces in Python. wsgiref.simple_server actually uses BaseHTTPServer, so the implementations are tied. wsgiref.simple_server is a much better API than BaseHTTPServer. Even then, wsgiref.simple_server isn't the only server based on BaseHTTPServer, so it's not without some use as an abstract base class for servers. It's just not a useful base class for applications. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Removal of Cookie in Python 3.0 OK?
Brett Cannon wrote: As part of the standard library cleanup for Python 3.0, it has been suggested to me that the Cookie module be removed. The rationale for this is that most of the module is already deprecated and cookielib does a better job for cookie support anyway. I just wanted to see if anyone here had strong objections (along with reasons) as to why the module should be kept around in some form or another. I think most frameworks still use the Cookie module. The cookielib module is more oriented to the client side. It doesn't seem to have the same parsing functions that you'd use on the server side (though maybe they are there and just not documented because they also exist in the Cookie module). Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] reorg of web-related modules for Python 3K
Bill Janssen wrote: Over on the stdlib-sig, Brett's proposing that we move some of the HTTP-related classes: OK, to keep this ball rolling, here is my suggestion for reorganizing HTTP modules: httplib - http.tools BaseHTTPServer - http.server SimpleHTTPServer - http.server CGIHTTPServer - http.server I think WSGI is a better interface than any of these. BaseHTTPServer is a reasonable basis for building a server (wsgiref.simple_server and other's use it), but the subclasses are a little funky IMHO. Giving them the name http.server makes them seem like the Right Solution, and I don't think they are. They're more like server-building tools. cookielib - http.cookies Since the various HTTP server modules have no name clashes we can consolidate them into a single module. Seems reasonable to me, but I thought it should be looked at in this forum. All this is going into PEP 3108, so either join the stdlib-sig, or read the PEP, if you care about all this. Alexandre Vassalotti further proposes the following: xmlrpclib - xmlrpc.tools SimpleXMLRPCServer - xmlrpc.server DocXMLRPCServer - xmlrpc.server Similarly here I think there are better ways to arrange servers than these subclasses -- both more reusable and simpler. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Manlio Perillo wrote: Ian Bicking ha scritto: [...] 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) I suppose you could leave stuff on PATH_INFO. But that doesn't seem to fit with the idea of PATH_INFO. Also, will it be strictly SCRIPT_NAME/consumed_path/PATH_INFO, or could it be SCRIPT_NAME/consumed_path/some_other_parsing/consumed_path/PATH_INFO -- after all, there's cases where stuff gets pushed from PATH_INFO to SCRIPT_NAME, and if consumed_path is in between, which one do you push stuff to? What do you intend by some_other_parsing? I have code that takes stuff from PATH_INFO and puts it on SCRIPT_NAME without updating routing_args. It could update routing_args... but I guess the question still remains: if there's multiple places where this kind of transformation is done, which one does SCRIPT_NAME point to? Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com