Re: [Web-SIG] Web application packaging

2012-06-08 Thread Ian Bicking
I've been doodling around with things, but honestly I've deployed zero
Python apps in the last year, so lacking a use case of any kind I find
myself rather unfocused, even though I feel a degree of confidence about
the approach.

Anyway, my indirect doodling is here: https://github.com/ianb/apppkg/

I'm interested in a cross-language approach, which means a lot of process
isolation, and that's where things are vague right now – it would probably
be a bit easier if that didn't mean process isolation with Python on both
sides, because that's where it gets vague.


On Thu, Jun 7, 2012 at 3:08 AM, Alex Morega a...@grep.ro wrote:

 Hello!

 There was a discussion here, about an year ago, about ways to deploy WSGI
 applications to servers. What is the status? What tools are out there,
 being currently developed, other than Buildout, Fabric and Silver Lining?

 Cheers,
 -- Alex


 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move www.wsgi.org to Read The Docs.

2011-08-18 Thread Ian Bicking
I believe Stephan Diehl owns wsgi.org.


On Thu, Aug 18, 2011 at 4:14 PM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 Who owns and manages www.wsgi.org wiki?

 The amount of spam the wiki gets now is becoming rediculous.

 If we care about the wiki, it is time to take the content in it and
 dump it in github as a project which can then be loaded up to Read The
 Docs, with www.wsgi.org directing to that.

 In the mean time, can anyone else help clean up the spam. I am usually
 the only one who does it, but this time there is too much and becomes
 a waste of my time. I only have so many phone meetings where I can
 secretly be cleaning up the spam at the same time. So, many hands make
 light work. :-)

 Overall I reckon moving to github and Read The Docs may also encourage
 greater participation as far as putting some useful content in it.
 Personally I find wikis a pain for that sort of content and so can't
 be bothered to work on the actual content. If it was on guthub and
 Read The Docs I am more likely myself to help build out the content
 with actual decent useful content, moving some of the stuff I have
 blogged about or put elsewhere there instead.

 Graham
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-27 Thread Ian Bicking
On Wed, Apr 27, 2011 at 5:21 PM, Daniel Holth dho...@gmail.com wrote:

 I stumbled across https://apphosted.com as more web application package
 and format 'prior art'. It appears to be an App Engine competitor. According
 to their API documentation, their deployment format is an archive containing
 a single directory with your WSGI program and a metro.config. They put the
 database configuration in a settings.py written into the application's root
 with defined DB_URI, etc.


There's something that bothers me about using settings.py, though I guess
it's not that different from a YAML file or whatever, though with a
cleverness danger.  Conveniently you could do sys.modules['settings'] =
new.module('settings') and avoid ever making a real file.

Using the name settings *specifically* is likely to cause name clashes
with existing Django applications.

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-15 Thread Ian Bicking
On Fri, Apr 15, 2011 at 2:05 PM, Alice Bevan–McGregor
al...@gothcandy.comwrote:

 I want to keep this distinct from anything long-running, which is a much
 more complex deal.


 The primary application is only potentially long-running.  (You could, in
 theory, deploy an app as CGI, but that way lies madness.)  However, the
 reference syntax mentioned (excepting URL) works well for identifying this.


Right -- just one long running things (but no promises how long).



  I think given the three options, and for general simplicity, the script
 can be successful or have an error (for Python code: exception or no; for
 __main__: zero exit code or no; for a URL: 2xx code or no), and can return
 some text (which may only be informational, not structured?)


 For the simple cases (script / callable), it's pretty easy to trap STDOUT
 and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR,
 then display that to the administrator in some form.  Same for HTTP, except
 that it can include full HTML formatting information.


For Silver Lining I set Accept: text/plain, to at least suggest that plain
text was preferred, since typically HTML isn't easily displayed.  But of
course a tool could change that, probably usefully?  But that only applies
to HTTP.  Anyway, seems easy enough.



  An application configuration could refer to scripts under different names,
 to be invoked at different stages.


 A la the already mentioned post-install, pre-upgrade, post-upgrade,
 pre-removal, and cron-like.  Any others?


test-environment, test-alive, test-functional are all possible
test-alive could be used by, e.g., Nagios to monitor (it might actually have
structured output?)




  There could be an optional self-test script, where the application could
 do a last self-check -- import whatever it wanted, check db settings, etc.
 Of course we'd want to know what it needed *before* the self-check to try to
 provide it, but double-checking is of course good too.


 Unit and functional tests are the most obvious.  In which case we'll need
 to be able to provide a localhost-only 'mounted' location for the
 application even though it hasn't been installed yet.


For local function HTTP tests you might want that, but if you are doing
non-HTTP functional tests (e.g., just WGSI) or unit tests then the
environment should always be sufficient without actually serving anything
up.  You'd probably want a test set of local services (as opposed to a
development set of services).  I think this will all be another kind of
tooling around development.




  One advantage to a separate script instead of just one script-on-install
 is that you can more easily indicate *why* the installation failed.  For
 instance, script-on-install might fail because it can't create the database
 tables it needs, which is a different kind of error than a library not being
 installed, or being fundamentally incompatible with the container it is in.
 In some sense maybe that's because we aren't proposing a rich error system
 -- but realistically a lot of these errors will be TypeError, ImportError,
 etc., and trying to normalize those errors to some richer meaning is
 unlikely to be done effectively (especially since error cases are hard to
 test, since they are the things you weren't expecting).


 Humans are potentially better at reading tracebacks than machines are, so
 my previous logging idea (script output stored and displayed to the
 administrator in a readable form) combined with a modicum of reasonable
 exception handling within the script should lead to fairly clear errors.


Deployers aren't very good at reading developer tracebacks, so it is kind of
nice if you at least have a sense of the stage.  One advantage to multiple
testing stages is that you might roll back before, e.g., having to deal with
database migrations.  But easy enough to skip for now.


 I'd like to see maybe an | operator, and a distinction between required and
 optional services.  E.g.:



 No need for some new operator, YAML already supports lists.

 services:
- [mysql, postgresql, dburl]

 Or:

 services:
required:
- files

optional:
- [mysql, postgresql]


  And then there's a lot more you could do... which one do you prefer, for
 instance.


 The order of services within one of these lists would indicate preference,
 thus MySQL is preferred over PostgreSQL in the second example, above.


Sure



  Tricky things:
 - You need something funny like multiple databases.  This is very
 service-specific anyway, and there might sometimes need to be a way to
 configure the service.  It's also a fairly obscure need.


 I'm not convinced that connecting to a legacy database /and/ current
 database is that obscure.  It's also not as hard as Django makes it look
 (with a 1M SLoC change to add support)… WebCore added support in three
 lines.


Well, then you are getting into specific configurations fitting into legacy

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-14 Thread Ian Bicking
On Thu, Apr 14, 2011 at 2:53 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 On 14 April 2011 16:57, Alice Bevan–McGregor al...@gothcandy.com wrote:
  3. Define how to get the WSGI app.  This is WSGI specific, but (1) is
  *not* WSGI specific (it's only Python specific, and would apply well to
  other platforms)
 
  I could imagine there would be multiple application types:
 
  :: WSGI application.  Define a package dot-notation entry point to a WSGI
  application factory.

 Why can't it be a path to a WSGI script file. This actually works more
 universally as it works for servers whichttps://
 bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298hmap 
 URLs to file based
 resources as well. Also allows alternate extensions than .py and also
 allows basename of file name to be arbitrarily named, both of which
 help with those same servers which map URLs to file base resources. It
 also allows same name WSGI script file to exist in multiple locations
 managed by same server without having to create an overarching package
 structure with __init__.py files everywhere.


The main way to load applications in Silver Lining is basically like a wsgi
script; or more specifically a file that is exec'd and it looks specifically
for a variable application.  Silver Lining also supports Paste Deploy .ini
files, but in practice this doesn't seem that important (after all you can
run paste.deploy.loadapp in the script).

In this case the mapping of filenames and use of extensions doesn't matter,
as applications would not be compelled to use any particular extension, and
traversing into the application wouldn't make sense.

Another thing that is common with .wsgi files (and similarly for App Engine
script handlers) is that developers do all sorts of initialization (like
changing sys.path etc).  This makes it hard to access the application except
through that entry point, thus requiring all access to be in the form of URL
fetching (again like App Engine).  So on one hand I like the .wsgi file
technique; on the other hand I don't ;)

Most of what we're talking about is, in Silver Lining, implemented in
silversupport.appconfig.  Particular pieces:

Loading the application:

https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-310
Set up sys.path:

https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298
Set up services:

https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-223

There's going to have to be a bit of indirection with services, as an
application is asking in effect for an interface, and each tool may
implement that interface differently (maybe a package could provide sort of
an abstract base class for these, but the specific implementation is going
to be very deployment-tool-specific).

Also generally more is setup before the .wsgi-like script is executed in
Silver Lining than in mod_wsgi.  Well, here's the actual mod_wsgi-.wsgi
script that Silver Lining uses:

https://bitbucket.org/ianb/silverlining/src/8597f52305be/silverlining/mgr-scripts/master-runner.py
But it's a bit confusing because it translates a bunch of variables set by
the rather obtuse Apache config to figure out what application to run and
how.  But sys.path is fixed up, services are activated (mostly meaning
they set their environmental variables), stderr/stdout is fixed up (since
there's some sense of logging in the system, I felt there was no reason to
bar use of those streams), and then some tool-specific stuff is done (e.g.,
fixing up the request URL given the Varnish setup).  These are the examples
of the kind of detailed specification of parts of the environment that I
guess we need to have -- it's really how the entire process is setup that we
need to specify, not just the WSGI request portion (which at least we don't
have to specify much since that's done).

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-14 Thread Ian Bicking
I think there's a general concept we should have, which I'll call a script
-- but basically it's a script to run (__main__-style), a callable to call
(module:name), or a URL to fetch internally.  I want to keep this distinct
from anything long-running, which is a much more complex deal.  I think
given the three options, and for general simplicity, the script can be
successful or have an error (for Python code: exception or no; for __main__:
zero exit code or no; for a URL: 2xx code or no), and can return some text
(which may only be informational, not structured?)

An application configuration could refer to scripts under different names,
to be invoked at different stages.

On Thu, Apr 14, 2011 at 1:57 AM, Alice Bevan–McGregor
al...@gothcandy.comwrote:

 On 2011-04-13 18:16:36 -0700, Ian Bicking said:

  While initially reluctant to use zip files, after further discussion and
 thought they seem fine to me, so long as any tool that takes a zip file can
 also take a directory.  The reverse might not be true -- for instance, I'd
 like a way to install or update a library for (and inside) an application,
 but I doubt I would make pip rewrite zip files to do this ;)  But it could
 certainly work on directories.  Supporting both isn't a big deal except that
 you can't do symlinks in a zip file.


 I'm not talking about using zip files as per eggs, where the code is
 maintained within the zip file during execution.  It is merely a packaging
 format with the software itself extracted from the zip during installation /
 upgrade.  A transitory container format.  (Folders in the end.)

 Symlinks are an OS-specific feature, so those are out as a core
 requirement.  ;)


  I don't think we're talking about something like a buildout recipe.  Well,
 Eric kind of brought something like that up... but otherwise I think the
 consensus is in that direction.


 Ambiguous statements FTW, but I think I know what you meant.  ;)


  So specifically if you need something like lxml the application specifies
 that somehow, but doesn't specify *how* that library is acquired.  There is
 some disagreement on whether this is generally true, or only true for
 libraries that are not portable.


 +1

 I think something along the lines of autoconf (those lovely ./configure
 scripts you run when building GNU-style software from source) with published
 base 'checkers' (predicates as I referred to them previously) would be
 great.  A clear way for an application to declare a dependency, have the
 application server check those dependencies, then notify the administrator
 installing the package.


There could be an optional self-test script, where the application could do
a last self-check -- import whatever it wanted, check db settings, etc.  Of
course we'd want to know what it needed *before* the self-check to try to
provide it, but double-checking is of course good too.

One advantage to a separate script instead of just one script-on-install is
that you can more easily indicate *why* the installation failed.  For
instance, script-on-install might fail because it can't create the database
tables it needs, which is a different kind of error than a library not being
installed, or being fundamentally incompatible with the container it is in.
In some sense maybe that's because we aren't proposing a rich error system
-- but realistically a lot of these errors will be TypeError, ImportError,
etc., and trying to normalize those errors to some richer meaning is
unlikely to be done effectively (especially since error cases are hard to
test, since they are the things you weren't expecting).

I've seen several Python libraries that include the C library code that they
 expose; while not so terribly efficient (i.e. you can't install the C
 library once, then share it amongst venvs), it is effective for small
 packages.


Generally compiling seems fairly reliable these days, but it does typically
require more system-level packages be installed (e.g., python-dev).
Actually invoking these installations in an automated and reliable way seems
hard to me.  I find debs/rpms to work well for these cases.  There is some
challenge when you need something that isn't packaged, but in many ways the
work you need to do is always going to be the same work you'd need to do to
package that library or the new version of that library.  So I'm inclined to
ask people to lean on the existing OS-level tooling for dealing with these
libraries.


 Larger (i.e. global or application-local) would require the intervention of
 a systems administrator.


  Something like a database takes this a bit further.  We haven't really
 discussed it, but I think this is where it gets interesting.  Silver Lining
 has one model for this.  The general rule in Silver Lining is that you can't
 have anything with persistence without asking for it as a service, including
 an area to write files (except temporary files?)


 +1

 Databases are slightly more difficult; an application could ask

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-13 Thread Ian Bicking
While we are focusing on points of contention, there may be more points of
consensus, but we aren't talking about those.

So, some initial thoughts:

While initially reluctant to use zip files, after further discussion and
thought they seem fine to me, so long as any tool that takes a zip file can
also take a directory.  The reverse might not be true -- for instance, I'd
like a way to install or update a library for (and *inside*) an application,
but I doubt I would make pip rewrite zip files to do this ;)  But it could
certainly work on directories.  Supporting both isn't a big deal except that
you can't do symlinks in a zip file.

I don't think we're talking about something like a buildout recipe.  Well,
Eric kind of brought something like that up... but otherwise I think the
consensus is in that direction.  So specifically if you need something like
lxml the application specifies that somehow, but doesn't specify *how* that
library is acquired.  There is some disagreement on whether this is
generally true, or only true for libraries that are not portable.

Something like a database takes this a bit further.  We haven't really
discussed it, but I think this is where it gets interesting.  Silver Lining
has one model for this.  The general rule in Silver Lining is that you can't
have anything with persistence without asking for it as a service, including
an area to write files (except temporary files?)  I assume everyone agrees
that an application can't write to its own files (but of course it could
execfile something in another location).

I suspect there's some disagreement about how the Python environment gets
setup, specifically sys.path and any other application-specific
customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in
silvercustomize.py, and find it helpful).  Describing the scope of this, it
seems kind of boring.  In, for example, App Engine you do all your setup in
your runner -- I find this deeply annoying because it makes the runner the
only entry point, and thus makes testing, scripts, etc. hard.

We would start with just WSGI.  Other things could follow, but I don't see
any reason to worry about that now.  Maybe we should just punt on aggregate
applications now too.  I don't feel like there's anything we would do that
would prevent other kinds of runtime models (besides the starting point,
container-controlled WSGI), and the places to add support for new things are
obvious enough (e.g., something like Silver Lining's platform setting).  I
would define a server with accompanying daemon processes as an aggregate.

An important distinction to make, I believe, is application concerns and
deployment concerns.  For instance, what you do with logging is a deployment
concern.  Generating logging messages is of course an application concern.
In practice these are often conflated, especially in the case of bespoke
applications where the only person deploying the application is the person
(or team) developing the application.  It shouldn't be *annoying* for these
users, though.  Maybe it makes sense for people to be able to include
tool-specific default settings in an application -- things that could be
overridden, but especially for the case when the application is not widely
reused it could be useful.  (An example where Silver Lining gets is all
backwards is I created a [production] section in app.ini when the very
concept of production is not meaningful in that context -- but these kind
of named profiles would make sense for actual application deployment
tools.)  An example of a setting currently in Silver Lining/app.ini that
should become a tool-specific default setting would be default_location
(the default place to upload your app to when you do silver update).


There's actually a kind of layered way of thinking of this:

1. The first, maybe most important part, is how you get a proper Python
environment.  That includes sys.path of course, with all the accompanying
libraries, but it also includes environment description.  In Silver Lining
there's two stages -- first, set some environmental variables (both general
ones like $SILVER_CANONICAL_HOST and service-specific ones like
$CONFIG_MYSQL_DBNAME), then get sys.path proper, then import silvercustomize
by which an environment can do any more customization it wants (e.g., set
$DJANGO_SETTINGS_MODULE)
2. Define some basic generic metadata.  app_name being the most obvious
one.
3. Define how to get the WSGI app.  This is WSGI specific, but (1) is *not*
WSGI specific (it's only Python specific, and would apply well to other
platforms)
4. Define some *web specific* metadata, like static files to serve.  This
isn't necessarily WSGI or even Python specific (not that we should bend
backwards to be agnostic -- but in practice I think we'd have to bend
backwards to make it Python-specific).
5. Define some lifecycle metadata, like update_fetch.  These are generally
commands to invoke.  IMHO these can be ad hoc, but exist in the scope of (1)

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Ian Bicking
On Sun, Apr 10, 2011 at 10:29 PM, Alice Bevan–McGregor
al...@gothcandy.comwrote:

 Howdy!


 On 2011-04-10 19:06:52 -0700, Ian Bicking said:

  There's a significant danger that you'll be creating a configuration
 management tool at that point, not simply a web application description.


 Unless you have the tooling to manage the applications, there's no point
 having a standard for them.  Part of that tooling will be some form of
 configuration management allowing you to determine the requirements and
 configuration of an application /prior/ to installation.  Better to have an
 application rejected up-front (Hey, this needs my social insurance number?
 Hells no!) then after it's already been extracted and potentially littered
 the landscape with its children.


I... think we are misunderstanding each other or something.

A nice tool that could use this format, for instance, would be a tool that
takes an app and creates a puppet recipe to setup a sever to host the
application.  A different tool (maybe better, maybe not?) would be a puppet
plugin (if that's the terminology) that uses this format to tell puppet
about all the requirements an application has, perhaps translating some
notions to puppet-native concepts, or adding high-level recipes that setup
an appropriate container (which can be as simple as a properly configured
Nginx or Apache server).

What I mean when I say there's a danger of becoming a configuration
management tool, is that if you include hooks for the application to
configure its environment you are probably stepping on the toes of whatever
other tool you might use.  And once you start down that path things tend to
cascade.




  The escape valve in Silver Lining for these sort of things is services,
 which can kind of implement anything, and presumably ad hoc services could
 be allowed for.


 Generic services are useful, but not useful enough.


  You create a build process as part of the deployment (and development and
 everything else), which I think is a bad idea.


 Please elaborate.  There is no requirement for you to use the application
 packaging format and associated tools (such as an application server)
 during development.  In fact, like 2to3, that type of process would only
 slow things down to the point of uselessness.  That's not what I'm
 suggesting at all.


If you include something in the packaging format that indicates the
libraries to be installed, then you are encouraging and perhaps requiring
that the server install libraries during a deployment.

Realistically this can't be entirely avoided, but I think it is a pretty
workable separation to declare only those dependencies that can't reasonably
be included directly in the application itself (e.g., lxml, MySQLdb, git,
and so on).  In Silver Lining those dependencies were expressed as Debian
package names, installed via dpkg, but for a more general system it would
need to be somewhat more abstract.  But several configuration management
tools have managed that abstraction already, so it seems feasible to handle
this declaratively.


  My model does not use setup.py as the basis for the process (you could
 build a tool that uses setup.py, but it would be more a development
 methodology than a part of the packaging).


 I know.  And the end result is you may have to massage .pth files yourself.
  If a tool requires you to, at any point during normal operation, hand
 modify internal files… that tool has failed at its job.  One does not go
 mucking about in your Git repo's .git/ folder, as an example.


.pth files aren't exactly an internal file -- they are documented feature
of Python.  And .git/config is also a human-readable/editable file!

But I did note that the setup in Silver Lining was a bit too primitive.  Not
*quite* as primitive as App Engine, but close.  I think it would be better
to have a convention like adding lib/python/ to the path automatically.  If
you want, for example, src/myapp to also be added to the path then I don't
think there's anything wrong with using a .pth file to do that; that's what
they were created to do!


 How do you build a release and upload it to PyPi?  Upload docs to
 packages.python.org?  setup.py commands.  It's a convienent hook with
 access to metadata in a convienent way that would make an excellent let's
 make a release! type of command.


  Also lots of libraries don't work when zipped, and an application is
 typically an aggregate of many libraries, so zipping everything just adds a
 step that probably has to be undone later.


 Of course it has to be un-done later.  I had thought I had made that quite
 clear in the gist.  (Core Operation, point 1, possibly others.)


  If a deploy process uses zip file that's fine, but adding zipping to
 deployment processes that don't care for zip files is needless overhead.  A
 directory of files is the most general case.  It's also something a
 developer can manipulate, so you don't get a mismatch between developers of
 applications

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Ian Bicking
On Mon, Apr 11, 2011 at 2:56 AM, Ionel Maries Cristian
ionel...@gmail.comwrote:

 Hello,

 I have few comments:

- That file layout basically forces you to have your development
environment as close to the production environment. This is especially
visible if you're relying on python c extensions. Since you don't want to
have the same environment constraints as appengine it should be more
flexible in this regard and offer a way to generate the project 
 dependencies
somewhere else than the depeloper's machine.

 Yes; in this case in Silver Lining I have allowed non-portable libraries to
be declared as dependencies, and then the deployment tool ensures they are
installed.



- There's no builtin support for logging configuration.

 This would be useful, yes; though I think the format itself would mostly
want to declare how it logs and then deployment tools could try to configure
that.  E.g., it would be useful to have a list of logging names that an app
uses.  The actual configuration is deployment-specific, so shouldn't be
inside the application format itself.



- The update_fetch feels like a hack as it's not extensible to do
lifecycle (hooks for shutdown, start, etc). Also, it's shouldn't be a
application url because you'd want to run a hook before starting it or 
 after
stopping it. I guess you could accomplish that with a wsgi wrapper but 
 there
should be a clear separation between the app and hooks that manage the app.

 In Silver Lining you can also do scripts; I started with URLs because it
was simpler on the implementation side, but scripts have generally been
easier to develop, so at least the default could be revisited.

At least in the case of mod_wsgi there isn't a very good definition of
shutdown and start.  There's the runner itself, that imports the WSGI
application -- this is always run on start, but it's the start of the worker
process, not necessarily the server process (IMHO starting the server
process is an internal implementation detail we should not expose).  Silver
Lining also tries to import a silvercustomize module, which is kind of a
universal initialization (also imported for tests, etc).  atexit can be used
to run stuff on process shutdown.  I don't really see a compelling benefit
to another process shutdown technique.  It seems perhaps reasonable to have
something that is run when the actual application instance is shut down, but
I've never personally needed that in practice.  Of course other
configuration settings could be added for different states if they were
reasonably universal states and there was a real need for those.



- I'm not entirely clear on why you avoid a build process (war-like)
prior to deployment. It works fine for appengine - but you don't have it's
constraints.

 In my own experience with App Engine I found it to be a useful constraint
-- it was not particularly hard to get around (at least if you understand
the relevant tools) and while App Engine has annoying constraints this
wasn't one of them.  Of course I couldn't use lxml at all on App Engine, and
I agree we shouldn't accept that constraint, but for the majority of
libraries that are portable this isn't a constraint.

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Ian Bicking
(I'm confused; I just noticed there's a web-sig@python.org and
python-web-...@googlegroups.com?)

On Mon, Apr 11, 2011 at 2:01 PM, Daniel Holth dho...@gmail.com wrote:

 We have more than 3 implementations of this idea, the Python Web
 Application Package and Format or WAPAF, including Java's WAR files, Google
 App Engine, silverlining. Let's review the WAR file, approximately:

 (static files, .jsp)
 WEB-INF/web.xml
 WEB-INF/classes/org/example/myapplication.class
 WEB-INF/lib/some-library.jar
 WEB-INF/lib/145-other-libraries.jar

 Build the .war file, copy to server, done (ideally). Your program should
 require a standard Java installation plus whatever's in the .war file. The
 .war file is a .zip that follows certain conventions.

 In practice you might develop in and deploy exploded .war files which are
 exactly the same thing but unzipped.

 Since it's Java there is no classes/SQLAlchemy/src/sqlalchemy/__init__.py;
 the path for the code always starts at classes/, not at some arbitrary set
 of subdirectories under classes/


Yes, this is all very reminiscent of my thoughts about this application
format, and I'm assuming web.xml is the kind of configuration file I expect,
etc.  I'd rather there be a convention like classes/ anyway (obviously with
a different name ;)


 installation.  Better to have an application rejected up-front (Hey,
 this needs my social insurance number? Hells no!) then after it's
 already been extracted and potentially littered the landscape with its
 children.

 Part of the potential win here is that the application need not litter
 anything. Like GAE, the server might keep all the previous versions you've
 uploaded and let you pick which one you want today. You shouldn't have to
 think about the state the server.


Yes; and for instance Silver Lining can have multiple versions installed
alongside each other, which makes it easier to do a quick update -- you can
upload everything, make sure everything is okay, and only then actually make
that new version active.  If the build process is well defined you can do
the same thing, but it's harder to be sure that it will work as expected.
 And if the build process is kind of free-form then you might end up in a
place where you have to take down the old version of an app as you update
the new version.

Data migrations are a bit more tricky, but with the services concept they
are possible, and can even be efficient if you use some deep Linux magic
(but if you are okay with a bit of inefficiency, or only applying this to
small databases, doing a fairly atomic application update is possible).

One of the items in Silver Lining's TODO is having a formal concept of
putting an application into read-only mode, which could be helpful for these
updates as well.

  My model does not use setup.py as the basis for the process (you could
  build a tool that uses setup.py, but it would be more a development
  methodology than a part of the packaging).

 I know.  And the end result is you may have to massage .pth files
 yourself.  If a tool requires you to, at any point during normal
 operation, hand modify internal files… that tool has failed at its
 job.  One does not go mucking about in your Git repo's .git/ folder, as
 an example.

 If I read the silverlining documentation correctly the .pth is created
 manually in the example only because there was no 'setup.py' to 'pip install
 -e'. As an alternative the spec could only add particular directories to
 PYTHONPATH. This might be a distutils2 thing.


PYTHONPATH shouldn't apply here, as it informs the Python executable, and
probably the executable will start before invoking the application (at least
with mod_wsgi it does, and there's a lot of other use cases where it could).
 You could have a setting in app.ini (or whatever equivalent config file)
with the paths to add, but I personally find that kind of messy feeling
compared to existing conventions like .pth files.  Ultimately they are
equivalent -- a file with a path name that is added to sys.path.

How do you build a release and upload it to PyPi?  Upload docs to
 packages.python.org?  setup.py commands.  It's a convienent hook with
 access to metadata in a convienent way that would make an excellent
 let's make a release! type of command.

 setup.py should go away. The distutils2 talk from pycon 2011 explains.
 http://blip.tv/file/4880990


That's kind of a red herring -- even if setup.py goes away it would be
replaced with something (pysetup I think?) which is conceptually equivalent.

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-10 Thread Ian Bicking
On Sun, Apr 10, 2011 at 6:40 PM, Alice Bevan–McGregor
al...@gothcandy.comwrote:

 On 2011-04-10 16:25:21 -0700, James Mills said:

  +1 too. I would however like to see this idea developed in a generic
 and useable way. ie: No zope/twisted deps or making it fit around
 Django :)
 Ideally it should be useable by the most basic (plain old WSGI).


 The following are the collected ideas of myself and a few other users in
 the WebCore chat room:

https://gist.github.com/911991

 Being generic (i.e. using WSGI under-the-hood) and allowing generic port
 assignments for other (non-web) networked applications is a design goal.


There's a significant danger that you'll be creating a configuration
management tool at that point, not simply a web application description.
 The escape valve in Silver Lining for these sort of things is services,
which can kind of implement anything, and presumably ad hoc services could
be allowed for.


 The aversion to packaged zips is not entirely understandable to us; in this
 case, a packaged copy of the application is produced via a setup.py command,
 though in theory one could develop with that model and just zip everything
 up in the end by hand.


You create a build process as part of the deployment (and development and
everything else), which I think is a bad idea.  My model does not use
setup.py as the basis for the process (you could build a tool that uses
setup.py, but it would be more a development methodology than a part of the
packaging).

Also lots of libraries don't work when zipped, and an application is
typically an aggregate of many libraries, so zipping everything just adds a
step that probably has to be undone later.  If a deploy process uses zip
file that's fine, but adding zipping to deployment processes that don't care
for zip files is needless overhead.  A directory of files is the most
general case.  It's also something a developer can manipulate, so you don't
get a mismatch between developers of applications and people deploying
applications -- they can use the exact same system and format.

Silver Lining seems to require too much in the way of hacking (modifying
 .pth files, etc) to be reasonable.


The pattern that it implements is fairly simple, and in several models you
have to lay things out somewhat manually.  I think some more convention and
tool support (e.g., in pip) would be helpful.

Though there are quite a few details, the result is more reliable, stable,
and easier to audit than anything based on a build process (which any use of
dependencies would require -- there are *no* dependencies in a Silver
Lining package, only the files that are *part* of the package).

Some notes from your link:

- There seems to be both the description of a format, and a program based on
that format, but it's not entirely clear where the boundary is.  I think
it's useful to think in terms of a format and a reference implementation of
particular tools that use that format (development management tools, like
installing into the format; deployment tools; testing tools; local serving
tools; etc).
- In Silver Lining I felt no need at all for shared libraries.  Some disk
space can be saved with clever management (hard links), but only when it's
entirely clear that it's just an optimization.  Adding a concept like
server-packages adds a lot of operational complexity and room for bugs
without any real advantages.
- I avoided exposing the concept of daemonization because it's not really an
application concern; or at least it certainly is not appropriate for a WSGI
application.  There are other applications that might need this, mostly
because they have no standard protocol equivalent to WSGI, but a generic
container is almost certain to be of higher quality and better situated to
its environment than a generic daemon.  (PID files, ugh)  At least
supervisord I think has a better representation of how to express daemon
configuration, but still I'm not a big fan of exposing this until it really
feels necessary.
- All dependencies are always version-sensitive; I think it's delusional
that people think otherwise.  Build the tooling to manage that process
(e.g., finding and testing newer versions), not the deployment.
- I try to avoid error conditions in the deployment, which is a big part of
not having any build process involved, as build processes are a source of
constant errors -- you can do a stage deployment, then five minutes later do
a production deployment, and if you have a build process there is a
significant chance that the two won't match.

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] A Python Web Application Package and Format

2011-04-01 Thread Ian Bicking
Hi all.  I wrote a blog post.  I would be interested in reactions from this
crowd.

http://blog.ianbicking.org/2011/03/31/python-webapp-package/

Copied to allow responses:

At PyCon there was an open space about deployment, and the idea of drop-in
applications 
(Java-WAR-stylehttp://en.wikipedia.org/wiki/WAR_file_format_%28Sun%29
).

I generally get pessimistic about 80% solutions, and dropping in a WAR file
feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer
(which I think is *specifically* a project that got WARs on people’s minds),
and in a lot of ways that installer is nice, but it’s also kind of wonky, it
makes configuration unclear, it’s not always clear when it installs or
configures itself through the web, and when you have to do this at the
system level, nor is it clear where it puts files and data, etc. So a great
initial experience doesn’t feel like a great ongoing experience to me — and
*it doesn’t have to be that way*. If those were *necessary* compromises,
sure, but they aren’t. And because we *don’t have* WAR files, if we’re
proposing to make something new, then we have every opportunity to make
things better.

So the question then is what we’re trying to make. To me: we want
applications that are easy to install, that are self-describing,
self-configuring (or at least guide you through configuration), reliable
with respect to their environment (not dependent on system tweaking),
upgradable, and respectful of persistence (the data that outlives the
application install). A lot of this can be done by the container (to use
Java parlance; or environment) — if you just have the app packaged in a
nice way, the container (server environment, hosting service, etc) can
handle all the system-specific things to make the application actually work.

At which point I am of course reminded of my Silver
Lininghttp://cloudsilverlining.org/project, which defines something
very much like this. Silver Lining isn’t
*just* an application format, and things aren’t fully extracted along these
lines, but it’s pretty close and it addresses a lot of important issues in
the lifecycle of an application. To be clear: Silver Lining is an
application packaging format, a server configuration library, a cloud server
management tool, a persistence management tool, and a tool to manage the
application with respect to all these services over time. It is a bunch of
things, maybe too many things, so it is not unreasonable to pick out a
smaller subset to focus on. Maybe an easy place to start (and good for
Silver Lining itself) would be to separate at least the application format
(and tools to manage applications in that state, e.g., installing new
libraries) from the tools that make use of such applications (deploy, etc).

Some opinions I have on this format, exemplified in Silver Lining:

   - It’s not zipped or a single file, unlike WARs. Uploading zip files is
   not a great API. Geez. I know there’s this desire to just drop in a file;
   but there’s no getting around the fact that dropping a file
becomes a *deployment
   protocol* *and* *it’s an incredibly impoverished protocol*. The format is
   also not subtly git-based (ala Heroku) because git push is not a good
   deployment protocol.
   - But of course there isn’t really any deployment protocol inferred by a
   format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool
   that deploys should take as an argument a directory, not a single file. (If
   the tool then zips it up and uploads it, fine!)
   - Configuration comes from the outside. That is, an application
   requests services, and the *container* tells the application where those
   services are. For Silver Lining I’ve used environmental variables. I think
   this one point is really important — the container *tells* the
   application. As a counter-example, an application that comes with a Puppet
   deployment recipe is essentially *telling* the server how to arrange
   itself to suit the application. This will never be reliable or simple!
   - The application indicates what services it wants; for instance, it
   may want to have access to a MySQL database. The container then provides
   this to the application. In practice this means installing the actual
   packages, but also creating a database and setting up permissions
   appropriately. The alternative is never having *any* dependencies,
   meaning you have to use SQLite databases or ad hoc structures, etc. But in
   fact installing databases really isn’t that hard these days.
   - *All* persistence has to use a service of some kind. If you want to be
   able to write to files, you need to use a file service. This means the
   container is fully aware of everything the application is leaving behind.
   All the various paths an application should use are given in different
   environmental variables (many of which don’t need to be invented anew, e.g.,
   $TMPDIR).
   - It uses vendor libraries exclusively for Python 

Re: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments

2011-03-17 Thread Ian Bicking
It's implied by WSGI itself that the path be unquoted; there's no fix short
of changing the specification.


On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf f...@chaoflow.net wrote:


 I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
 urllib.unquote the path [1] before setting it in the wsgi environment
 [2]. The only pre-processing performed on the path between [1] and [2]
 is concerned with slashes '/'. By urllib.unquoting it is not possible to
 have urllib.quoted slashes within one path segment.

 At least pyramid without routing fully relies on
 ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have
 slashes in path segments, they are handle by pyramid in [4]f.

 However, webob.request.BaseRequest would need to be adjusted wherever
 PATH_INFO from the environment is used (e.g [5]).

 Reasoning: The path stored in environ['PATH_INFO'] is still a path,
 therefore it must not be urllib.unquoted, the unquoting must happen
 after the path is split up in segments ([4]).

 [1]
 https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180
 [2]
 https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217
 [3]
 https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594
 [4]
 https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495
 [5]
 https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265

 --
 Florian Friesdorf f...@chaoflow.net
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
 Jabber/XMPP: f...@chaoflow.net
 IRC: chaoflow on freenode,ircnet,blafasel,OFTC

 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments

2011-03-17 Thread Ian Bicking
I'll just add that *if* you can design your URL space (you didn't just
inherit one), and you want to distinguish path segments from values that
contain '/', you can use URLs like:
  /item/{some/value}/view

And then use the matching {}'s to figure out that some/value is one path
segment.  This makes it possible, for instance, to use GData (where XML
namespaces can show up in the URL, and they contain /'s, but they need to be
treated as a single value).  It's not perfect, but it does work.


On Thu, Mar 17, 2011 at 4:02 PM, And Clover and...@doxdesk.com wrote:

 On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote:
  I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
  urllib.unquote the path before setting it in the wsgi environment

 I'm afraid it must. This is something the WSGI specification inherits
 from CGI.

 Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO
 automatically unescaped, as it loses the distinction between ‘%2F’ and
 ‘/’, and has resulted in endless problems with non-ASCII characters that
 could otherwise been handled perfectly well as %-sequences.

 But that decision was taken a couple of decades ago and there's not
 really much we can do about it now. CGI may be an anachronism, but it is
 still widely used and its assumptions are still felt through Apache, IIS
 and WSGI.

  By urllib.unquoting it is not possible to
  have urllib.quoted slashes within one path segment.

 Correct. And neither Apache nor IIS allows %2F to be used within a path
 segment either, so really if you want to write a portable web app you
 simply have to avoid them (along with %00 and %5C). It is not currently
 practical to include any arbitrary byte sequence in a URL path segment,
 even though by the URL specification you should be able to.

 It's annoying, it's inelegant, it's limiting. But none of our attempts
 to extend or replace it for non-CGI-based servers (see past list
 discussion on path-info-raw or standardising REQUEST_URI) have come to
 any acceptable conclusion. We are stuck with it for the foreseeable.

 --
 And Clover
 mailto:a...@doxdesk.com
 http://www.doxdesk.com
 gtalk:chat?jid=bobi...@gmail.com

 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 != WSGI 2.0

2011-01-01 Thread Ian Bicking
Until the PEP is approved, it's just a suggestion.  So for it to really be
WSGI 2 it will have to go through at least some approval process; which is
kind of ad hoc, but not so ad hoc as just to implicitly happen.  For WSGI 2
to happen, someone has to write something up and propose it.  Alice has
agreed to do that, working from PEP 444 which several other people have
participated in.  Calling it WSGI 2 instead of Web 3 was brought up on
this list, and the general consensus seemed to be that it made sense -- some
people felt a little funny about it, but ultimately it seemed to be
something everyone was okay with (with some people like myself feeling
strongly it should be WSGI 2).

I'm not sure why you are so stressed out about this?  If you think it's
really an issue, perhaps 2 could be replaced with 2alpha until such time
as it is approved?


On Sat, Jan 1, 2011 at 8:02 PM, Graham Dumpleton graham.dumple...@gmail.com
 wrote:

 Can we please clear up a matter.

 GothAlice (don't know off hand there real name), keeps going around
 and claiming:

 
 After some discussion on the Web-SIG mailing list, PEP 444 is now
 officially WSGI 2, and PEP  is WSGI 1.1
 

 In this instance on web.py forum on Google Groups.

 I have pointed out a couple of times to them that there is no way that
 PEP 444 has been blessed as being the official WSGI 2.0 but they are
 not listening and are still repeating this claim. They can't also get
 right that PEP  clearly says it is still WSGI 1.0 and not WSGI
 1.1.

 If the people here who's opinion matters are quite happy for GothAlice
 to hijack the WSGI 2.0 moniker for PEP 444 I will shut up. But if that
 happens, I will voice my objections by simply not having anything to
 do with WSGI 2.0 any more.

 Graham
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com




-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

2010-12-14 Thread Ian Bicking
On Sun, Dec 12, 2010 at 9:59 PM, Alice Bevan–McGregor
al...@gothcandy.comwrote:

 Howdy!

 There's one issue I've seen repeated a lot in working with WSGI1 and that
 is the use of middleware to process incoming data, but not outgoing, and
 vice-versa; middleware which filters the output in some way, but cares not
 about the input.

 Wrapping middleware around an application is simple and effective, but
 costly in terms of stack allocation overhead; it also makes debugging a bit
 more of a nightmare as the stack trace can be quite deep.

 My updated draft PEP 444[1] includes a section describing Filters, both
 ingress (input filtering) and egress (output filtering).  The API is
 trivially simple, optional (as filters can be easily adapted as middleware
 if the host server doesn't support filters) and easy to implement in a
 server.  (The Marrow HTTP/1.1 server implements them as two for loops.)


It's not clear to me how this can be composed or abstracted.

@webob.dec.wsgify does kind of handle this with its request/response
pattern; in a simplified form it's like:

def wsgify(func):
def replacement(environ):
req = Request(environ)
resp = func(req)
return resp(environ)
return replacement

This allows you to do an output filter like:

@wsgify
def output_filter(req):
resp = some_app(req.environ)
fiddle_with_resp(resp)
return resp

(Most output filters also need the request.)  And an input filter like:

@wsgify
def input_filter(req):
fiddle_with_req(req)
return some_app


But while it handles the input filter case, it doesn't try to generalize
this or move application composition into the server.  An application is an
application and servers are imagined but not actually concrete.  If you
handle filters at the server level you have to have some way of registering
these filters, and it's unclear what order they should be applied.  At
import?  Does the server have to poke around in the app it is running?  How
can it traverse down if you have dispatching apps (like paste.urlmap or
Routes)?

You can still implement this locally of course, as a class that takes an app
and input and output filters.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

2010-12-14 Thread Ian Bicking
On Tue, Dec 14, 2010 at 12:54 PM, Alice Bevan–McGregor
al...@gothcandy.comwrote:


  An application is an application and servers are imagined but not actually
 concrete.


 Could you elaborate?  (Define concrete in this context.)


WSGI applications never directly touch the server.  They are called by the
server, but have no reference to the server.  Servers in turn take an app
and parameters specific to there serveryness (which may or may not even
involve HTTP), but it's good we've gotten them out of the realm of
application composition (early on WSGI servers frequently handled mounting
apps at locations in the path, but that's been replaced with dispatching
middleware).  An application wrapped with middleware is also a single object
you can hand around; we don't have an object that represents all of
application, list of pre-filters, list of post-filters.




  If you handle filters at the server level you have to have some way of
 registering these filters, and it's unclear what order they should be
 applied.  At import?  Does the server have to poke around in the app it is
 running?  How can it traverse down if you have dispatching apps (like
 paste.urlmap or Routes)?


 Filters are unaffected by, and unaware of, dispatch.  They are defined at
 the same time your application middleware stack is constructed, and passed
 (in the current implementation) to the HTTPServer protocol as a list at the
 same time as your wrapped application stack.


  You can still implement this locally of course, as a class that takes an
 app and input and output filters.


 If you -do- need region specific filtering, you can ostensibly wrap
 multiple final applications in filter management middleware, as you say.
  That's a fairly advanced use-case regardless of filtering.

 I would love to see examples of what people might implement as filters
 (i.e. middleware that does ONE of ingress or egress processing, not both).
  From CherryPy I see things like:

 * BaseURLFilter (ingress Apache base path adjustments)
 * DecodingFilter (ingress request parameter decoding)
 * EncodingFilter (egress response header and body encoding)
 * GzipFilter (already mentioned)
 * LogDebugInfoFilter (egress insertion of page generation time into HTML
 stream)
 * TidyFilter (egress piping of response body to Tidy)
 * VirtualHostFilter (similar to BaseURLFilter)

 None of these (with the possible exception of LogDebugInfoFilter) I could
 imagine needing to be path-specific.


GzipFilter is wonky at best (it interacts oddly with range requests and
etags).  Prefix handling is useful (e.g.,
paste.deploy.config.PrefixMiddleware), and usually global and unconfigured.
Debugging and logging stuff often needs per-path configuration, which can
mean multiple instances applied after dispatch.  Encoding and Decoding don't
apply to WSGI.  Tidy is intrusive and I think questionable on a global
level.  I don't think the use cases are there.  Tightly bound pre-filters
and post-filters are particularly problematic.  This all seems like a lot of
work to avoid a few stack frames in a traceback.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

2010-09-23 Thread Ian Bicking
On Thu, Sep 23, 2010 at 11:06 AM, P.J. Eby p...@telecommunity.com wrote:

 At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote:

 On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby mailto:p...@telecommunity.com
 p...@telecommunity.com wrote:
 The Python 3 specific changes are to use:

 * ``bytes`` for I/O streams in both directions
 * ``str`` for environ keys and values
 * ``bytes`` for arguments to start_response() and write()


 This is the only thing that seems odd to me -- it seems like the response
 should be symmetric with the request, and the request in this case uses str
 for headers (status being header-like), and bytes for the body.


 So, I've given some thought to your suggestion, and, while it's true that
 most of the output headers are far less prone to ending up with unintended
 unicode content, there are at least two output headers that can include some
 sort of application content (and can therefore have random failures):
 Location and Set-Cookie.

 If these headers accidentally contain non-Latin1 characters, the error
 isn't detectable until the header reaches the origin server doing the
 transmission encoding, and it'll likely be a dynamic (and therefore
 hard-to-debug) error.


I don't see any reason why Location shouldn't be ASCII.  Any header could
have any character put in it, of course, there's just no valid case where
Location shouldn't be a URL, and URLs are ASCII.  Cookie can contain
weirdness, yes.  I would expect any library that abstracts cookies to handle
this (it's certainly doable)... otherwise, this seems like one among many
ways a person can do the wrong thing.

This can also be detected with the validator, which doesn't avoid runtime
errors, but bytes allow runtime errors too -- they will just happen
somewhere else (e.g., when a value is converted to bytes in an application
or library).

If servers print the invalid value on error (instead of just some generic
error) I don't think it would be that hard to track down problems.  This
requires some explicit effort on the part of the server (most servers handle
app_iter==None ungracefully, which is a similar problem).


However, if the output is always bytes (and this can be
 relatively-statically verified), then any error can't occur except *inside*
 the application, where the app's developer can find it more easily.

 So I guess the question boils down to: would we rather make sure that
 coding errors happen *inside* applications, or would we rather make porting
 WSGI apps trivial (or nearly so)?

 But I think that it's possible here to have one's cake and eat it too: if
 we require bytes for all outputs, but provide a pair of decorators in
 wsgiref.util like the following:

def encode_body(codec='utf8'):
Allow a WSGI app to output its response body as strings
 w/specified encoding
def decorate(app):
def encode(response):
try:
for data in response:
yield data.encode(codec)
finally:
if hasattr(response, 'close'):
response.close()
def decorated_app(environ, start_response):
def start(status, response_headers, exc_info=None):
_write = start_response(status, response_headers,
 exc_info)
def write(data):
return _write(data.encode(codec))
return write
return encode(app(environ, start))
return decorated_app
return decorate

def encode_headers(codec='latin1'):
Allow a WSGI app to output its headers as strings, w/specified
 encoding
def decorate(app):
def decorated_app(environ, start_response):
def start(status, response_headers, exc_info=None):
status = status.encode(codec)
response_headers = [
(k.encode(codec), v.encode(codec)) for k,v in
 response_headers
]
return start_response(status, response_headers,
 exc_info)
return app(environ, start)
return decorated_app
return decorate

 So, this seems like a win-win to me: relatively-static verification, errors
 stay in the app (or at least in the decorator), and the API is
 clean-and-easy.  Indeed, it seems likely that at least some apps that don't
 read wsgi.input themselves could be ported *just* by adding the appropriate
 decorator(s).  And, if your app is using unicode on 2.x, you can even use
 the same decorators there, for the benefit of 2to3.  (Assuming I release an
 updated standalone wsgiref version with the decorators, of course.)


This doesn't seem that different than the validator, except that the
decorator uses a different interface internally and externally (the internal
interface using text, the external one bytes).


-- 
Ian Bicking  |  http://blog.ianbicking.org

Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

2010-09-23 Thread Ian Bicking
On Thu, Sep 23, 2010 at 11:17 AM, Ian Bicking i...@colorstudy.com wrote:

  If these headers accidentally contain non-Latin1 characters, the error
 isn't detectable until the header reaches the origin server doing the
 transmission encoding, and it'll likely be a dynamic (and therefore
 hard-to-debug) error.


 I don't see any reason why Location shouldn't be ASCII.  Any header could
 have any character put in it, of course, there's just no valid case where
 Location shouldn't be a URL, and URLs are ASCII.  Cookie can contain
 weirdness, yes.  I would expect any library that abstracts cookies to handle
 this (it's certainly doable)... otherwise, this seems like one among many
 ways a person can do the wrong thing.


Minor correction, Set-Cookie, not Cookie.  Good practice is to stick to
ASCII even there (all other techniques have a high risk of mojibake), so
we're really considering legacy integration.  Note that a similar problem is
using [('Content-length', len(body))] -- which also results in a sometimes
confusing error message well away from the application itself.

Generally without validation any data errors occur away from the
application.  A type error is not any different than an encoding error.
Using bytes removes a possible encoding error, but IMHO has a greater chance
of type errors (as bytes are not as natural as text in most cases).
Validation can check all aspects, including encoding (simply by doing a test
encoding).

Consider this hello world:

def app(environ, start_response):
body = b'Hello World'
start_response(b'200 OK', [(b'Content-Type',
str(len(body)).encode('ascii'))])
return [body]

str(len(body)).encode('ascii')?!?  Yuck.  Also no 2to3 fixup can help
there.  bytes(len(body)) does something weird.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

2010-09-23 Thread Ian Bicking
On Thu, Sep 23, 2010 at 3:23 PM, P.J. Eby p...@telecommunity.com wrote:

 At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote:

 I don't see any reason why Location shouldn't be ASCII.  Any header could
 have any character put in it, of course, there's just no valid case where
 Location shouldn't be a URL, and URLs are ASCII.  Cookie can contain
 weirdness, yes.  I would expect any library that abstracts cookies to
 handle this (it's certainly doable)... otherwise, this seems like one among
 many ways a person can do the wrong thing.


 This can also be detected with the validator, which doesn't avoid runtime
 errors, but bytes allow runtime errors too -- they will just happen
 somewhere else (e.g., when a value is converted to bytes in an application
 or library).


 Right: somewhere much closer to the *actual* error, where the developer can
 know the problem is, I have garbage data or have not selected an
 appropriate codec, rather than this WSGI stuff is giving me errors some
 place.


  If servers print the invalid value on error (instead of just some generic
 error) I don't think it would be that hard to track down problems.  This
 requires some explicit effort on the part of the server (most servers handle
 app_iter==None ungracefully, which is a similar problem).


 The difference is that if a server rejects non-bytes, you'll know *right
 away* that your app isn't compliant, instead of having to wait until some
 non-latin1 data shows up.


No, you've only pushed the encoding elsewhere, and the error elsewhere.
Somewhere someone is probably doing text_value.encode('ascii') (or latin1 or
whatever), and if they haven't tested with non-ascii or non-latin1 input
then they might encounter an error.  It will be in their code, not in the
WSGI server, but the error will be present in all the same situations.  I
don't think it will be much harder to fix if it occurs in the WSGI server,
so long as the error message is at least a little bit helpful.


 AFAICT, there are only two advantages to using text for output headers:

 1. Text is easier to work with, and
 2. It's symmetric with using text for input headers.

 Both of which can still be had, by using the @encode_headers decorator.


Sure, anything can be fixed in a library.  But @encode_headers is just
another library.  And it also can't magically appear with 2to3, instead it
requires yet more patches and weird workarounds.

Also, what you are proposing hasn't been considered for PEP 444, though
other combinations of bytes and text have (all symmetric).  So it doesn't
seem to have any clean way to translate into the next version of the
specification.


 I'm a little bit on the fence on this one, because 1) it does seem a little
 pointless (if harmless) to shuffle headers around in bytes form, and 2)
 Location and Set-Cookie are very likely the only headers where any kind of
 damage could ever happen.


Set-Cookie only, Location is clean.  The entirety of hand-wringing over
bytes is all just about freakin' cookies.  Or the theory of cookies, I don't
know that anyone has yet encountered any concrete and vexing problems.

But, since it *can* happen, and because it is also really easy to fix the
 API issue with a decorator, I'm still leaning in favor of output is bytes
 over headers are text, bodies are bytes, unless somebody can come up with
 either some actually-bad consequence of using bytes, or some extra-good
 consequence of using text (that isn't addressed by just using the
 decorator).

 (Note, by the way, that WSGI design has always leaned in the direction of
 any convenience that can be handled by a library should be, if it keeps
 the spec simpler and more verifiable.  So, this seems like a good use of
 that principle.)


It only fixes the one case of non-Latin1 characters, there are still many
other values you can put into a header (a newline or control character for
instance), and innumerable header-specific issues.  It seems to be adding
complexity for one of the least problematic cases.

--
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

2010-09-21 Thread Ian Bicking
On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough chr...@plope.com wrote:

 On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote:
  While the Web-SIG is trying to hash out PEP 444, I thought it would
  be a good idea to have a backup plan that would allow the Python 3
  stdlib to move forward, without needing a major new spec to settle
  out implementation questions.

 If a WSGI-1-compatible protocol seems more sensible to folks, I'm
 personally happy to defer discussion on PEP 444 or any other
 backwards-incompatible proposal.


I think both make sense, making WSGI 1 sensible for Python 3 (as well as
other small errata like the size hint) doesn't detract from PEP 444 at all,
IMHO.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

2010-09-21 Thread Ian Bicking
On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby p...@telecommunity.com wrote:

 The Python 3 specific changes are to use:

 * ``bytes`` for I/O streams in both directions
 * ``str`` for environ keys and values
 * ``bytes`` for arguments to start_response() and write()


This is the only thing that seems odd to me -- it seems like the response
should be symmetric with the request, and the request in this case uses str
for headers (status being header-like), and bytes for the body.

Otherwise this seems good to me, the only other major errata I can think of
are all listed in the links you included.

* text stream for wsgi.errors

 In other words, strings in, bytes out for headers, bytes for bodies.

 In general, only changes that don't break Python 2 WSGI implementations are
 allowed.  The changes should also not break mod_wsgi on Python 3, but may
 make some Python 3 wsgi applications non-compliant, despite continuing to
 function on mod_wsgi.

 This is because mod_wsgi allows applications to output string headers and
 bodies, but I am ruling that option out because it forces every piece of
 middleware to have to be tested with arbitrary combinations of strings and
 bytes in order to test compliance.  If you want your application to output
 strings rather than bytes, you can always use a decorator to do that.  (And
 a sample one could be provided in wsgiref.)


I agree allowing both is not ideal.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-19 Thread Ian Bicking
On Sun, Sep 19, 2010 at 11:32 AM, Chris McDonough chr...@plope.com wrote:

  I propose to write in the PEP that a middleware should provide an
  app attribute to get the wrapped application or middleware.
  It seems to be the most common name used out there.

 We can't really mandate this because middleware is not required to be an
 instance.  It can be a function.


We could suggest it, and suggest the attribute name.  Composites, lazy
loading middleware, or a bunch of other situations can break it... but it's
nice for introspection tools to at least be able to attempt to run down the
chain.  Middleware is almost always a closure if it's a function, I believe,
so you could still do:

def caps(app):
def replacement_app(environ):
status, headers, body = app(environ)
body = [''.join(body).upper()]
return status, headers, body
replacement_app.app = app
return replacement_app

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-18 Thread Ian Bicking
On Sat, Sep 18, 2010 at 5:03 AM, Marcel Hellkamp m...@gsites.de wrote:

 With WSGI it was possible to yield empty strings as long as the
 application is waiting for data and call start_response once the headers
 are final. Not perfect, but at least non-blocking. Web3 removes this
 possibility. The headers must be returned before the body iterable
 yielded its first element, empty or not.

 Removing any support for this type of asynchronism would render web3
 useless for all but completely synchronous and trivial applications.
 Even frameworks would have no way to work around this anymore.


I'm aware of what a lot of people have done with WSGI, but I'm not aware of
anyone doing an async proxy of any sort, or implementing anything in a way
where this empty string policy served any function.  It's not implausible
that it *could* be used, but years of practice have shown it is not used.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-17 Thread Ian Bicking
On Fri, Sep 17, 2010 at 9:43 AM, And Clover and...@doxdesk.com wrote:

 On 09/17/2010 02:03 PM, Armin Ronacher wrote:

  In case we change the spec as Ian mentioned above, I am all for
 a wsgi.guessed_encoding = True flag or something like that.


 Yes, I'd like to see that. I believe going with *only* a
 raw-or-reconstructed path_info, rather than having both path_info and
 PATH_INFO, is probably best, for the middleware-dupication reasons PJE
 mentioned.

 A more in-depth possibility might be:

 wsgi.path_accuracy =

0: script_name/path_info have been crudely reconstructed from
SCRIPT_NAME/PATH_INFO from an unknown source. Beware!
If there is to be backwards compatibility with WSGI1, this
would be seen as the 'default value' given a missing path_accuracy.

1: script_name/path_info have been reconstructed, but it is known
that path_info is accurate, other than %2F and non-ASCII issues.
That is, it's known that the path doesn't come from IIS's broken
PATH_INFO, or the IIS error has been detected and compensated for.

2: script_name/path_info have been reconstructed using known-good
encodings for the env. The only way in which they may differ from
the original request path is that a slash might originally have
been a %2F. (This is good enough for the vast majority of
applications.)

3: script_name/path_info come directly from the request path
without any intervening mangling.


path_accuracy is certainly a better name than encoding; nothing here
actually relates to encoding (except insofar as attempts to encode or
reencode values corrupts the path).  Personally I wouldn't want to split it
up this much, I'd rather a simple flag to indicate something was guessed,
vs. an accurate request.  The only real value I see in it is to help people
debug problems.  Maybe.  I'm not sure it's that realistic to imagine this
will be noticed by people deploying software and encountering problems.  A
helpful application could use it to warn the deployer of potential problems.

It seems that it would be possible to create a WSGI application and client
library that together can detect and help resolve these issues.  E.g., the
application always returns the values of script_name, path_info, and
query_string, and the client fires off a bunch of different requests to see
how it gets interpreted.  It could suggest corrections until everything
passes.

I would really like to see concerns over bad gateways not be used to keep
valuable information out of the spec.  We want people to use well-configured
gateways that accurately represent requests.  There are limits, e.g., in
environments where information is lost.  The only really problematic example
is losing the distinction between %2f and /, and I think it's reasonable to
suggest that applications should avoid making that distinction in the path
if they want to be easily deployed in different environments.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-17 Thread Ian Bicking
On Fri, Sep 17, 2010 at 1:02 PM, Ian Bicking i...@colorstudy.com wrote:

 I would really like to see concerns over bad gateways not be used to keep
 valuable information out of the spec.  We want people to use well-configured
 gateways that accurately represent requests.  There are limits, e.g., in
 environments where information is lost.  The only really problematic example
 is losing the distinction between %2f and /, and I think it's reasonable to
 suggest that applications should avoid making that distinction in the path
 if they want to be easily deployed in different environments.


Just to expand -- the reason %2f is special is because / has special meaning
in URL paths, or at least is treated as such.  ? has special meaning too,
but that's already handled by splitting off QUERY_STRING.  Technically ; is
supposed to mean something, but no one ever cared, so it doesn't really.  In
theory you could make any character special, and in doing so want an escape
mechanism to determine the difference between, e.g., , and %2c... but no
one does that, so no problem.

All the other potential problems are problems of gateway corruption.  E.g.,
where the bytes were decoded with Latin1 and then encoded with
sys.getfilesystemencoding(), or some other mismatched combination.  I don't
believe we should expose gateway corruption to the spec.  I *do* believe
that we can build tools inside WSGI to help debug and fix those problems,
and I don't think any of these changes makes those tools particularly harder
to implement.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-17 Thread Ian Bicking
On Fri, Sep 17, 2010 at 1:37 PM, chris.d...@gmail.com wrote:

 On Fri, 17 Sep 2010, Ionel Maries Cristian wrote:

  I feel this spec puts too much burden on applications - having to process
 all those byte strings and even having to add Content-Length even for
 naive
 buffered-body apps.


 The Content-Length requirement is a big killer for me. I'm usually
 generating content in apps, rather deep in a stack of middleware-like
 pieces that may or may not be looking at or modifying that content.
 I don't want to a) have to unwind my generators at each level b)
 reset the content-length here there and everywhere.

 It could be I'm doing it completely wrong, but it works rather
 nicely.


I'm unclear what exactly you guys are reacting to.  This?


   - The server must not inject an additional Content-Length header by
   guessing the length from the response iterable. This must be set by the
   application itself in all situations.


I'm also not sure what motivated this particular change, but I don't have
any opinion one way or the other.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-17 Thread Ian Bicking
On Fri, Sep 17, 2010 at 2:06 PM, Armin Ronacher armin.ronac...@active-4.com
 wrote:

 Hi,


 On 9/17/10 7:43 PM, Ian Bicking wrote:

 I'm also not sure what motivated this particular change, but I don't
 have any opinion one way or the other.

 Motivation is that WSGI wants servers to do something like this:

   if len(iterable) == 1 and content_length_header_missing:
   headers.append(('Content-Length', str(len(iterable[0])))

 However not everybody was doing that and some applications were setting a
 content length header or not.  If a content length header was not set some
 middlewares that changed content worked properly even though they did not
 check the header.  The idea is that with web3 every tool in the chain is
 supposed to look for that header and update it appropriately.

 Even the piglatin middleware from the PEP 333 did not check the content
 length if I remember correctly.


OK, so maybe it should just be clarified:

* Middleware and servers should not modify or add Content-Length, Date, or
other headers unless they have reason to do so, and they must ensure that
the response is valid (e.g., there should never be two Content-Length
headers).

It still seems reasonable that *if* there is no Content-Length, and the
server can guess easily enough (mostly it is returned an actual list/tuple
that we know can be introspected fast and without side effects), then it's
perfectly reasonable to set it -- but certainly the server doesn't own
that header (or any other, except maybe some connection-related headers?).

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-16 Thread Ian Bicking
Well, reiterating some things I've said before:

* This is clearly just WSGI slightly reworked, why the new name?
* Why byte values in the environ?  No one has offered any real reason they
are better than native strings.  I keep asking people to offer a reason,
*and no one ever does*.  It's just hyperbole and distraction.  Frankly I'm
feeling annoyed.  So far my experience makes me believe using native strings
will make it easier to port and support libraries across 2 and 3.
* It makes sense to me that the error stream should accept both bytes and
unicode, and should do a best effort to handle either.  Getting encoding
errors or type errors when logging an error is very distracting.
* Instead of focusing on Response(*response_tuple), I'd rather just rely on
something like Response.from_wsgi(response_tuple).  Body first feels very
unnatural.
* Regarding long response headers, I think we should ignore the HTTP spec.
You can put 4k in a Set-Cookie header, such headers aren't easily or safely
folded... I think the line length constraint in the HTTP spec isn't a
constraint we need to pay attention to.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-16 Thread Ian Bicking
On Thu, Sep 16, 2010 at 12:35 PM, Guido van Rossum gu...@python.org wrote:

 On Thu, Sep 16, 2010 at 10:01 AM, Ian Bicking i...@colorstudy.com wrote:
  Well, reiterating some things I've said before:
 
  * This is clearly just WSGI slightly reworked, why the new name?
  * Why byte values in the environ?  No one has offered any real reason
 they
  are better than native strings.  I keep asking people to offer a reason,
  *and no one ever does*.  It's just hyperbole and distraction.  Frankly
 I'm
  feeling annoyed.  So far my experience makes me believe using native
 strings
  will make it easier to port and support libraries across 2 and 3.

 Hm. IIUC the proposal is to implicitly assume Latin1 when decoding the
 bytes to Unicode. I worry that this will just perpetuate mojibake and
 other atrocities committed in Python 2.


I was reading http://python.org/dev/peps/pep-0444/ -- is there another
revision under discussion?  This seems to explicitly say all environ values
will be bytes.  There have been other str-oriented proposals, including
mod_wsgi's implementation.

There is consensus that request and response bodies should be bytes.  So
really we're talking about whether headers and status are bytes or native
strings.  Most HTTP headers can only contain sensible characters in ASCII,
and while anyone can submit anything in a header I'm not aware of it being a
problem that, e.g., someone submits a Cache-Control header with non-ASCII
values.

There are a small number of headers that can reasonably contain Latin1
characters.  Latin1 is specified in HTTP, and in a few instances RFC2047
encoding is allowed, though I don't believe anyone proposes that servers
should try to handle RFC2047 (I believe CherryPy does/did do this, but I
believe Robert Brewer who is in charge of that project supports removing
that).  There are headers that can reasonably contain RFC2047, but this can
be decoded at the application level.

The Cookie header does frequently contain incorrect encodings, but to handle
this you have to decode the header as bytes or latin1 (all the meaningful
characters are the same in both cases) and then decode/transcode values
after parsing.  Latin1 imposes only a small speedbump for a header that
already has a bunch of speedbumps.

The other case when Latin1 is not appropriate is the URL-decoded path, WSGI
1's SCRIPT_NAME and PATH_INFO.  This proposal removes those.  The
URL-encoded values are ASCII-safe, or at least could be safely normalized to
be safe in the server level.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-16 Thread Ian Bicking
On Thu, Sep 16, 2010 at 4:58 PM, Armin Ronacher armin.ronac...@active-4.com
 wrote:

 - Bytes values in the environment:

  HTTP transmits bytes, that's a fact we can't change.  When we go
  with native strings we will go with unicode on 3.x  This has the
  following implications:

  - getting the right path info requires a decode + an encode
unless you are assuming latin1.


Not if you are working with the URL-encoded paths.


  - same as above for the script name and cookie header


Cookie is weird.  If that one header could be bytes, that'd be great... but
special-casing Cookie/Set-Cookie is too hard/weird.

Plus handling Cookie/Set-Cookie as Latin1 is just one more line of code
(well, two, one for each header).

 When going with unicode strings on 3.x for environ values, we would
  have to do the same for outgoing values which makes middlewares a lot
  harder to write:


All response headers handle encoded URLs (e.g., Location), so
SCRIPT_NAME/PATH_INFO issues don't come into play.  Set-Cookie could be an
issue, though only really when someone wants to replicate an external
system's weird cookies -- except for legacy issues it's best for application
developers to stick to ASCII cookies (URL-encoding cookie values is a
popular way of doing this).

I don't know of any other header (or the status) that would reasonably cause
a problem.  And I'm not glossing over corner cases -- I'm generally very
aware and concerned with legacy issues, and interacting with legacy
systems.  There just aren't any here except for the resolvable issues I've
listed.

- web3.errors

  I think Ian raised concern that it's specified to support unicode
  only.  I don't think we should change that to accepting either bytes
  or unicode is a good idea on Python 3 where there is no stream in
  the language or standard library that accepts both at the same time.
  An implementation for 2.x could support both, but I don't know if
  there is a usecase for that.  In general though I have to say that
  very few people use wsgi.errors currently, so I don't think this is
  a real issue anyways.


It's more of an issue under Python 2, it could probably be ignored with
Python 3.  Under Python 2 when you have some error condition it's really
frustrating to encounter some unicode error with the logging of that error
(often covering up the original error).


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-16 Thread Ian Bicking
On Thu, Sep 16, 2010 at 9:59 PM, Armin Ronacher armin.ronac...@active-4.com
 wrote:

  On 9/17/10 3:43 AM, Ian Bicking wrote:

 Not if you are working with the URL-encoded paths.


 SCRIPT_NAME / PATH_INFO will always stay unencoded and the current spec
 requires the web3.script_name thing to only be provided if the server can
 safely provide that.  So at least for the fallback, we are dealing with
 (properly latin1 decoded) non-URL encoded things.  Can be changed of course.


Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away.  For
servers without access to the unencoded value, reencoding those values
doesn't actually lose any information over what we have now, and avoids any
encoding issues.  Servers with REQUEST_URI can at least attempt to
reconstruct the encoded values.




  Cookie is weird.  If that one header could be bytes, that'd be great...
 but special-casing Cookie/Set-Cookie is too hard/weird.

 Special casing one header is indeed weird.


Cookie is also the one header that can't be safely folded.  It's just a
messed up header, and requires hacky workarounds.




  I don't know of any other header (or the status) that would reasonably
 cause a problem.  And I'm not glossing over corner cases -- I'm
 generally very aware and concerned with legacy issues, and interacting
 with legacy systems.  There just aren't any here except for the
 resolvable issues I've listed.

 Technically speaking it would affect etags too, but I doubt anyone is using
 non-ASCII quoted strings there.  A very funny header is btw the Warning
 header which actually can have any encoding:

 The warn-text SHOULD be in a natural language and character set that is
 most likely to be intelligible to the human user receiving the response.
 This decision MAY be based on any available knowledge, such as the location
 of the cache or user, the Accept-Language field in a request, the
 Content-Language field in a response, etc. The default language is English
 and the default character set is ISO-8859-1.

 If a character set other than ISO-8859-1 is used, it MUST be encoded in the
 warn-text using the method described in RFC 2047 [14].

 Doubt anyone is using that header though.


The Title header (in Atompub) also suggests 2047, but that's essentially an
ASCII conversion like URL quoting. It looks something like
=?iso-8859-1?q?p=F6stal?=


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Ian Bicking
On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote:
  On 07/14/2010 06:43 AM, Ian Bicking wrote:
 
 
  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
  and HTTP_COOKIE.
 
 
  (And of those, PATH_INFO is the only one that really matters, in that
 no-one really uses non-ASCII script filenames,

 FWIW, I had to go to a lot of trouble to allow non ASCII in final
 SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in
 Apache means a file system path can make up part of SCRIPT_NAME. I had
 someone who was specifically using Russian in a WSGI script file name
 and because with AddHandler that becomes part of SCRIPT_NAME you had
 to cater for it. Anyway this was more of a Windows issue in having to
 use special file system functions to deal with fact that on Windows
 filesystem paths aren't UTF-8 but something else.

 What this does highlight though is that although one can talk about
 passing raw script name through to application, that isn't necessarily
 right as it isn't the application that dictates what encoding may be
 used but the web server which is performing the mapping of that part
 of the original URL path to a potential filesystem resource, or
 alternatively where file based configuration for mount point, the
 encoding of the web sever configuration file.


This is an Apache-specific issue.  It definitely doesn't apply to
paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
Nginx or other servers work.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby p...@telecommunity.com wrote:

 At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:

 And this doesn't help with Python 3: either we have byte values of
 SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think
 bytes will be more awkward to port to than text, and inconsistent with other
 WSGI values.


 OTOH, it has the tremendous advantage of pushing the encoding question onto
 the app (or framework) developer...  who's really the only one who can make
 the right decision for their particular application.  And personally, I'd
 rather have clear boundaries between text and bytes, such that porting (even
 if tedious or awkward) is *consistent*, and clear as to when you're
 finished, not, oh, did I check to make sure I converted SCRIPT_NAME and
 PATH_INFO...  not just in my app code, but in all the library code I call
 *from* my app?

 IOW, the bytes/string discussion on Python-dev has kind of led me to
 realize that we might just as well make the *entire* stack bytes (incoming
 and outgoing headers *and* streams), and rewrite that bit in PEP 333 about
 using str on Python 3000 to say we go with bytes on Python 3+ for
 everything that's a str in today's WSGI.


This was my first intuition too, until I started thinking in more detail
about the particular values involved.  Some obviously are textish, like
environ['SERVER_NAME'].  Not a very useful value, but definitely text.

Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)

And there's a few things like REMOTE_USER that are kind of in the middle.
Everyone is in agreement that bodies should be bytes.

One initial problem is that the Python 3 stdlib handles bytes poorly, so for
instance there's no good way to reconstruct the URL using the stdlib.  That
explains certain tensions, but I think we should ignore that, and in fact
that's what Python-Dev seemed to say pretty clearly.

Now, the other keys:

wsgi.url_scheme: clearly ASCII

SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old
legacy encoding.
raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL
encoding happens at the byte layer, so a server could reasonably URL encode
any non-ASCII characters without imposing any encoding.

QUERY_STRING: should be ASCII, same as raw request path

headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by
the specification.  The spec also implies you have use the RFC2047 inline
encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and
supporting it would probably be a bad idea for security reasons.  The
Atompub spec (reasonably modern) specifically says Title headers should be
encoded with RFC2047 (if they are not ISO-8859-1):
http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 --
decoding this kind of encoding at the application layer seems reasonable to
me.

cookie header: this specific header can easily have multiple encodings, as
the browser encodes data then treats it as opaque bytes, so a cookie can be
set via UTF-8 one place, Latin1 another, and those coexist in one header.
That is, there is no real encoding and this should be treated as bytes.
(Latin1 is an approximation of bytes... a spotty way to treat bytes, but
entirely workable.)

response status: I believe the spec says this must be Latin1/ISO-8859-1.  In
practice it is almost always ASCII, and since it is not user-visible it's
not something that really needs localization.

response headers: the spec implies Latin1, in practice the Set-Cookie header
is bytes (since interoperation with wonky legacy systems is not uncommon).
I'm not sure of any other exceptions?


So... to me it seems pretty reasonable for HTTP specifically that text can
work.  And if feels weird that, say, environ['SERVER_NAME'] be text and
environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR']
should be in that mode.  And it would also be weird if
environ['SERVER_NAME'] was bytes.

In the past when we've gotten down to specifics, the only holdup has been
SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com wrote:

 On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:

   In the past when we've gotten down to specifics, the only holdup has
 been
   SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.
 
  I think I favor PJE's suggestion:  let WSGI deal only in bytes.

 I'd prefer that WSGI 2 was defined in terms of a bytes with benefits
 type (Python 2's ``str`` with an optional encoding attribute as a hint
 for cast to unicode str) instead of Python 3-style bytes.

 But if I had to make the Hobson's choice between Python 3 style bytes
 and Python 3 style str, I'd choose bytes.  If I then needed to write
 middleware or applications, I'd use WebOb or an equivalent library to
 enable a policy which converted those bytes to strings on my behalf.
 Making it easy to write raw middleware or applications without using
 such a library doesn't seem as compelling a goal as being able to easily
 write one which allowed me direct control at the raw level.


What are the concrete problems you envision with text request headers, text
(URL-quoted) path, and text response status and headers?

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:06 PM, Ian Bicking i...@colorstudy.com wrote:

 On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver tsea...@palladion.comwrote:

   Basically all the internal strings are textish, so we're left with:

 What do you mean by internal?  Anything in the headers or the CGI
 environment is intrinsically bytes-ish to me.  Do you mean that you
 want application programmers to have them transparently decoded?  If so,
 we can make that the responsibility of the non-middleware framework /
 application.


 By internal I mean all the CGI variables that aren't representing HTTP,
 like SERVER_NAME.


Actually I was thinking SERVER_SOFTWARE, though SERVER_NAME is somewhat
similar as it doesn't come from HTTP, it comes from server configuration.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 8:46 PM, Ian Bicking i...@colorstudy.com wrote:

 So... before jumping to conclusions, what's the hard part with using text?


Oh, the one thing that will be silly is cookies, but they are totally nuts
already.  They can be parsed equally well as bytes or latin1, and best only
transcoded after parsing.  Doing cookie_value.decode(app_encoding) or
cookie_value.encode('ISO-8859-1').decode(app_encoding) isn't terribly
different.  And cookies aren't fair because they are just stupid; like the
standard library I don't think we should design anything around their
idiosyncrasies.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

  Nah, not nearly that hard:
 
  path_info =
 urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
 
  I don't see the problem?  If you want to distinguish %2f from /, then
 you'll do it slightly differently, like:
 
  path_parts = [
  urllib.parse.unquote_to_bytes(p).decode('UTF-8')
  for p in environ['wsgi.raw_path_info'].split('/')]
 
  This second recipe is impossible to do currently with WSGI.
  So... before jumping to conclusions, what's the hard part with using

 Sorry, it is not that simple. The thing that everyone is ignoring is
 that SCRIPT_NAME and PATH_INFO are also normalized by the web server
 normally. That is, .. instances are removed. By passing the raw URL
 through to the application, you are now forcing every application to
 have to deal with that as well with the possibility of directory
 traversal attacks when people get it wrong and the URL is mapping
 somehow to file system resources. It is a huge can of worms which at
 the moment the web server deals with.


Well... at least to me raw only means not URL decoded, so it doesn't
necessarily mean you can't clean up the request path.  I guess an attacker
could encode . to make things harder.

Nevertheless, WSGI servers don't currently guarantee this cleaning.  I added
it to paste.httpserver, but I don't know one way or the other about any
other servers.  A quick test shows wsgiref does not clean paths.  So apps
shouldn't rely on a clean path.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-14 Thread Ian Bicking
On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

   * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
  exclusively with encoded versions (that represent the original request
  URI).  We use Latin1 encoding, but it should be ASCII anyway, like most
 of
  the headers.

 BTW, it should be highlighted whether this change is relevant to
 Python 3 but like some of the other things you relegated as out of
 scope, purely a wish list item.


Certainly; most headers or metadata is pretty much constrained to ASCII, and
any use of non-ASCII is... at least peculiar, and presumably
application-specific.  For instance, there's no reason you'd have anything
but ASCII in Cache-Control.  The one place encoded information happens
regularly in headers (that I know of) is Cookie.  The request URI path is
generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI
path, they are URL decoded versions of the request URI path.  And they are
usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is
problematic (though we could define that they must be decoded with
surrogateescape).  And while they are usually UTF8, they are sometimes no
valid encoding at all, because anyone can assemble any set of characters
they want and web browsers will accept it.

By avoiding URL-unquoting of these values, we can also stick to Latin1 and
get something reasonable.  It's not very attractive to me that we take
something that is probably *not* Latin1, and may reasonably not be ASCII,
and decode it as Latin1.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] http://wiki.python.org/moin/WebFrameworks

2009-11-28 Thread Ian Bicking
Personally, if the author/maintainer of any library claims it is
maintained/up-to-date, I say trust them.  Most people are pretty honest
about the status of their projects.  But it does require a positive response
to really make this claim.

On Sat, Nov 28, 2009 at 12:03 PM, Aaron Watters arw1...@yahoo.com wrote:


 
  On Thu, Nov 26, 2009 at 1:02 PM, Chris McDonough chr...@plope.com
  wrote:
   http://wiki.python.org/moin/WebFrameworks
  seems to be the place where folks
   are registering their respective web frameworks.
  
   I'd like to move some of the frameworks which are
  currently in the various
   categories which haven't been active in a few years.
   In particular, I'd
   like to move any framework which hasn't had a release
  since the beginning of
   2008 (arbitrary) into the Discontinued / Inactive
  framework category.  I'd
   be willing to do the work to make sure I wasn't moving
  one that actually
   *did* have releases past that but just hadn't updated
  the page.
  
   Any dissent?
  
   - C

 Why not call them apparently stable
 versus under active development?  Is the
 cgi module discontinued?

 I'm a little sensitive on this topic
 because people tell me that Gadfly is inactive
 or discontinued
 but it still does what it does
 as documented very well.

 Frequent releases may actually be a sign of
 bugginess and bad design.
 If you suspect a project is really dead, maybe you
 could try to contact the authors and ask about
 what they think.

  -- Aaron Watters

 ===
 BTW, I think Release early, release often is nonsense
 because it means you are probably releasing
 something buggy and unstable which will just alienate
 your users, who will never come back to see the better
 version.

 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com




-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread Ian Bicking
On Fri, Nov 27, 2009 at 12:20 PM, P.J. Eby p...@telecommunity.com wrote:


  1. The 'readline()' function of 'wsgi.input' may optionally take a size
 hint.

 Already de-facto required. Leaving it out helps no-one. KEEP.


 Fair enough, since it's a MAY.  On the other hand, because it's a MAY, it
 actually *helps* no-one, from a spec compatibility POV.  (That is, you have
 to test whether it's available, so it's no different than it not being in
 the spec to begin with.)

 So, putting it in doesn't *hurt*, but neither does it *help*...  so I lean
 towards leaving it to 2.x, where it can actually help.


I think it was meant to be a must.  The *caller* MAY pass in a size hint,
the implementor MUST implement this optional argument.  This is the de-facto
requirement.


   2. The 'wsgi.input' must provide an empty string as end of input stream
 marker.

 I don't think this will be a problem. What would WSGI middleware do to
 break this requirement?


 It could be reading the original input stream, and replacing it with
 another one.  Not very common I would guess, but it's still possible for a
 piece of perfectly valid 1.0 middleware to fail this requirement for 1.1,
 leading to the condition where you really can't tell if you're running valid
 1.1 or not.


Middleware sometimes does this, but any time it does this it always replaces
the input stream with something truly file-like, e.g., StringIO or a temp
file.  Nothing but servers really hands sockets around, and sockets are the
only objects I'm aware of that don't act quite like a file.


 It was only put in in the first place so that CGI adapters could pass
 through their input stream (which may not ever provide an EOF) without
 having to wrap it. I agree that was a mistake, and should be corrected.


 I agree...  but only in 2.x.



   3. The size argument to 'read()' function of 'wsgi.input' would be
 optional and if not supplied the function would return all available request
 content. Thus would make 'wsgi.input' more file like as the WSGI
 specification suggests it is, but isn't really per original definition.

 This one could be a problem with middleware, and that feature shouldn't
 ever be used, in any case: reading into memory an arbitrary amount of data
 from a client is not a good thing to encourage. OMIT.


 Agreed -- even in 2.x it's questionable if not harmful.


Well, we need a way to handle content of unknown length, but if the file
terminates with '' then this isn't that important.

  4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must honour the
 Content-Length response header and must only return from the file that
 amount of content. This would guarantee that using wsgi.file_wrapper to
 return part of a file for byte range requests would work.

 Given item #6, I suppose this is actually just a matter of efficiency, in
 case the file wrapper is sent to a middleware rather than directly to the
 wsgi gateway? If it goes directly to the gateway, that can of course stop
 reading by itself. ?undecided?


 I don't really see how this one helps anything in 1.x, and so lean towards
 leaving it out.


I don't really understand this either, unless it was handling range
responses as well.  Content-Length alone isn't very interesting in this
case.

  5. Any WSGI application or middleware should not return more data than
 specified by the Content-Length response header if defined.

 As long as this is meant as SHOULD, that's fine. It's not actually a
 requirement, but rather a suggestion of best practices. KEEP.

  6. The WSGI adapter must not pass on to the server any data above what
 the Content-Length response header defines if supplied.

 This is already required by HTTP. If the WSGI gateway doesn't make this
 happen somehow, it's generating invalid HTTP and that's a bug. Okay to
 clarify in the spec to ensure people don't miss the requirement when
 implementing. KEEP.


 Good points - I agree with these two, and they can be considered 1.0
 clarifications as well.  After the first four (which I see no reason to
 include) I was probably a little over-inclined to throw these two out
 (especially since I was reading the should above as a must, per your
 proposal).


In this context, maybe 4 is just an extension of these?  Put 4 after 6 and
maybe it'll seem more obvious...?

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-25 Thread Ian Bicking
On Wed, Nov 25, 2009 at 2:03 PM, Tres Seaver tsea...@palladion.com wrote:

 Aaron Watters wrote:
 
  --- On Wed, 11/25/09, Chris Dent chris.d...@gmail.com wrote:
 
  From: Chris Dent chris.d...@gmail.com
  I can (barely) relate to some of the complaints that
  start_response is a pain in the ass, but environ, to me, is
  not broken.
 
  I agree.  It maps nicely onto the underlying protocol
  and WSGI is supposed to be low level right?
 
  The biggest problem with start_response is that after
  you evaluate
 
  iterable = application(env, start_response)
 
  Sometimes the start_response has been called and sometimes
  it hasn't, and this can break middlewares when they haven't
  been tested both ways (repose.who for example seems to
  assume it has been called).

 Since version 1.0.13 (2009-04-24), repoze.who's middleware is very
 careful to dance around the fact that an application is not required to
 have called 'start_response' on return, but *must* call it before
 returning the first chunk from its iterator.  That bit of flexibility in
  PEP 333 is likely there to support *some* use case, but it makes
 'start_response' a *big* pain to work with in middleware which needs to
 to egress processing of headers.


Just in terms of history, I think I'm to blame on this one, as I argued
quite vigorously for start_response.  The reason being that at the time
frameworks that had a concept of streaming usually did it by writing to
the response.  While the names were different depending on the framework,
this was the common way to do streaming:

def file_app(req):
filename = ...
req.response.setHeader('Content-Type',
mimetypes.guess_type(os.path.splitext(filename)[1])[0])
# I believe most did not stream by default...
req.response.stream()
fp = open(filename, 'rb')
while 1:
chunk = fp.read(4096)
if not chunk: break
req.response.write(chunk)

To support that style of streaming start_response was added.  I think PJE
also had some notion of Comet-style interactions, and maybe something
related to async, leading to the specific restrictions on how written
content should be handled.  I still don't entirely understand the use case
underlying that.  But anyway, that's some of the motivation.  start_response
is still useful for retrofitting support for frameworks from time to time,
but all the modern frameworks work differently these days making
start_response seem less necessary.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-24 Thread Ian Bicking
On Tue, Nov 24, 2009 at 3:28 PM, Malthe Borch mbo...@gmail.com wrote:


 The proposal that seemed to work best was to keep the environ as str
 (i.e., unicode in Python 3), and eliminate the problematic SCRIPT_NAME
 and PATH_INFO, replacing them with url-encoded values.  Also I think
 everyone is okay with removing start_response.  All text would be
 decoded as latin1 on Python 3 (which allows for transcoding; also most
 text is not unicode).  The request and response body would remain bytes.


 I assume with all text you mean all header text, e.g. all header values.


All the things that are specified to be str, would stay str in Python 3.
 This includes all keys, headers, and stuff like wsgi.url_scheme.


 Can we talk briefly then about wsgi.*? I think we should eliminate them and
 in their place put a real request object, something very basic that has only
 what's absolutely necessary to communicate the essential data from the
 low-level HTTP request.

 There is no way that the environment can express an HTTP request. This was
 a mistake in my view and we should rectify it either in 1.1 or 2.0.


I'm not aware of any problems with representing the request with a
dictionary.  Can you give examples?


-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-24 Thread Ian Bicking
On Tue, Nov 24, 2009 at 3:40 PM, Malthe Borch mbo...@gmail.com wrote:

  I'm not aware of any problems with representing the request with a
  dictionary.  Can you give examples?

 The body stream is not part of the HTTP environment. It's an abuse and
 it has the very negative effect of luring developers into further
 abuse.


You mean specifically environ['wsgi.input'] ?  While the file-like interface
is difficult, other possible interfaces aren't so great either.  As to
putting the request body in the environment, I don't know what the problem
is?  Or are you just concerned that people put arbitrary things in the
environ?  There's far too many important use cases that are satisfied by the
extensible nature of the environ to give it up just because some people
believe it is overused.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-24 Thread Ian Bicking
On Tue, Nov 24, 2009 at 4:16 PM, Sylvain Hellegouarch s...@defuze.org wrote:

  I'm not aware of any problems with representing the request with a
 dictionary.  Can you give examples?


 Though it shouldn't be considered as a problem, the fact that probably no
 existing framework actually use the raw dictionary (there is, in almost all
 cases, a wrapping into a friendlier object), one might wonder why keeping
 such a low level interface rather than directly provide a higher level
 interface is a good idea. After all creating those dictionaries for no good
 reason aside from sending them to the next layer which will map them into a
 WebOb, a yaro, a cherrypy request, or zope request, etc. seems slightly
 pointless (I'm not versed into Python internals, but doesn't it have also a
 cost of creating rather useless objects repeatedly like that?) I know WSGI
 tries hard not to force into one implementation but still...


Well, that's hardly a trivial requirement, nor a trivial accomplishment.
 Also the dictionary is a complete and inspectable representation of the
environment, divorced from any possible trickery on the part of frameworks.
 It's a common gateway between servers and frameworks, and can be used as a
gateway between middleware and applications.  And it's really fairly common
for middleware to use the raw dictionary without any object involved.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WebOb API

2009-10-29 Thread Ian Bicking
Hi all.

So, it's about time that WebOb came to 1.0.  For 1.0 I'd like to
settle the API as much as possible.  But I'd also like to move further
to getting WebOb used for more frameworks.  I don't expect that to
happen before 1.0, but if there are API changes that will make that
easier later, then maybe we can get those in.

While I haven't tracked ongoing changes to frameworks, I did put
together the differences I am aware of in APIs here:

  http://pythonpaste.org/webob/differences.html

Some of them are fairly trivial, and could be managed through
subclassing (e.g., req.raw_post_data vs. req.body -- semantically
identical, just different names).

Are there API changes that would help people consider WebOb for other
frameworks?  The main ones I can think of is req.FILES, separating out
file uploads from other POST fields.  Also then there's the issue of
what kind of object represents files.  The finer details of individual
objects are also important, things like the API of req.GET/req.POST
(which are views on ordered dictionaries, and are represented somewhat
differently in different frameworks).

Also I'm planning on introducing a BaseRequest (and *maybe*
BaseResponse) class, that removes some functionality.  Specifically
for Repoze they'd like to remove __getattr__ and __setattr__ (which
has some performance implications), and maybe other things are
possible (though removing writers is infeasible, IMHO, as read and
write access are not easily separated, and it would require too much
code duplication).

(Incidentally WebOb is now on bitbucket: http://bitbucket.org/ianb/webob/)

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info

2009-09-28 Thread Ian Bicking
Thanks for the test case; fixed in tip now.  If anything goes wrong what
should happen is a return value of (quote(script_name), quote(path_info)) --
there's no combination of request_uri/script_name/path_info that should
cause an exception (except bugs).  As you say, there's no promise that those
values are in any way related, and when that is the case it is appropriate
to fix it up at the WSGI stage (not necessarily in the WSGI adapter itself).


On Mon, Sep 28, 2009 at 2:34 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 2009/9/28 Ian Bicking i...@colorstudy.com:
  I tried implementing some code to convert REQUEST_URI (the raw request
 URL)
  and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info.
http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2)
http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3)
  Admittedly the tests are not very complete, I just wasn't feeling
 creative
  about test cases.  In terms of performance this avoids being entirely
 brute
  force, but feels kind of complex.  I'm betting there's an entirely
 different
  approach which is faster.  But whatever.

 Got an error:

  mod_wsgi (pid=4301): Exception occurred processing WSGI script
 '/Users/grahamd/Testing/tests/wsgi20.wsgi'.
  Traceback (most recent call last):
   File /Users/grahamd/Testing/tests/wsgi20.wsgi, line 80, in application
 environ['PATH_INFO'])
   File /Users/grahamd/Testing/tests/wsgi20.wsgi, line 64, in
 request_uri_to_path
 remove_segments = remove_segments - 1 -
 qscript_name_parts[-1].lower().count('%2f')
  IndexError: list index out of range

 This was an extreme corner case where Apache mod_rewrite was being
 used to do stuff:

 RewriteEngine On
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteRule ^(.*)$ /wsgi20.wsgi/$1 [QSA,PT,L]

 and Apache was configured to allow encoded slashes. The input would have
 been:

 REQUEST_URI: '/a%2fb/c/d'
 SCRIPT_NAME: '/wsgi20.wsgi'
 PATH_INFO: '/a/b/c/d'

 That style of rewrite rule is quite often used with Apache, although
 allowing encoded slashes isn't.

 That SCRIPT_NAME needs to be adjusted is a known consideration with
 this rewrite rule. Usually you would use wrapper around WSGI
 application which does:

 def _application(environ, start_response):
# The original application.
...

 import posixpath

 def application(environ, start_response):
# Wrapper to set SCRIPT_NAME to actual mount point.
environ['SCRIPT_NAME'] = posixpath.dirname(environ['SCRIPT_NAME'])
if environ['SCRIPT_NAME'] == '/':
environ['SCRIPT_NAME'] = ''
return _application(environ, start_response)

 If that algorithm is used in WSGI adapter however, would never get the
 opportunity to do that though as would already have failed before it
 got called.

 Graham




-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info

2009-09-27 Thread Ian Bicking
I tried implementing some code to convert REQUEST_URI (the raw request URL)
and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info.
  http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2)
  http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3)

Admittedly the tests are not very complete, I just wasn't feeling creative
about test cases.  In terms of performance this avoids being entirely brute
force, but feels kind of complex.  I'm betting there's an entirely different
approach which is faster.  But whatever.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Ian Bicking
It's not a specific proposal, but here's my opinions on what a proposal
should be:

On Tue, Sep 22, 2009 at 1:06 AM, Mark Nottingham m...@mnot.net wrote:

 OK, that's quite exhaustive.

 For the benefit of those of us jumping in, could you summarise your
 proposal in something like the following manner:

 1. How the request method is made available to WSGI applications


Graham talked about it as bytes/unicode/native, where native is unicode on
Python 3 and str on Python 2.  For instance, I think there's general
consensus (though not really specifically discussed) that environ keys
should be native.

I think method should be native.


 2. How the request-uri is made available to WSGI applications -- in
 particular, whether any decoding of punycode and/or %-escapes happens


Hah, didn't even think about de-punycoding HTTP_HOST.  That'd be a blast.

I think:
* scheme as native
* HTTP_HOST as native (no decoding of punycode)
* path as native (no URL decoding) - big break with WSGI 1 and CGI, but what
the hell.  I could easily waffle on this.
* query string as native - *should* be ASCII-safe currently.

Wow, that was easy!

Request headers, which you didn't split out... those I'm not sure.  I'd
*like* them to be native.  But damn, I'm just not sure quite how.
surrogateescape?  Latin1?  Latin1 as a kind of poor man's surrogateescape
isn't so bad.  And the headers *should* be ASCII for sane requests, so it's
not a horrible compromise.  I guess libraries could lazilly transcode, just
like they currently lazily decode.  But it'd be a bit obnoxious at the
library level.  Transcoding middleware would be easier, but it adds the
question of how to record that the transcoding has taken place.


 3. How request headers are made available to WSGI apps


Request handlers?  I don't understand your terminology.


 4. How the request body is made available to to WSGI apps


Ugh.  wsgi.input could remain.  I think at least it should become a
file-like interface (i.e., giving an empty string when the content is
exausted) and I might even ask that it implement .tell() (.seek() would be
nice of course, but optional).  If there was some other idea, I think
there's room for improvement on wsgi.input and the file interface.

wsgi.input should definitely work with bytes only.  I believe this is
consensus.


 5. Likewise for how apps should expose the response status message, headers
 and body to WSGI implementations.


I believe there is consensus that the response body should remain an
iterator that yields bytes.

In one way, it'd be nice if we'd just say that status/headers should be
ASCII, because that's the reasonable choice.  But for proxying or
representing HTTP as it is, it's not always the case.  And I'm committed
to keeping WSGI fully capable of representing arbitrary requests and
responses so long as they aren't entirely diabololical.

But, an ASCII status is not unreasonable, especially since there's zero
semantic meaning to the reason.  Which makes native strings perfectly fine.

So, headers...

Well, Latin1 is easy enough.  In theory, or at least particular theories,
headers can be Latin1.  And you can represent arbitrary bytes that way.  So
if you want to send crazy stuff to the browser, you can do it that way.  And
if you want to stick to plain ASCII then that's easy enough as well.  So...
native?  str or unicode?  I'm not sure specifically for this one.


-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Ian Bicking
On Tue, Sep 22, 2009 at 3:16 AM, Armin Ronacher armin.ronac...@active-4.com
 wrote:

 Hi,

 Ian Bicking schrieb:
  Request headers, which you didn't split out... those I'm not sure.  I'd
  *like* them to be native.  But damn, I'm just not sure quite how.
  surrogateescape?  Latin1?  Latin1 as a kind of poor man's surrogateescape
  isn't so bad.  And the headers *should* be ASCII for sane requests, so
 it's
  not a horrible compromise.
 Except for cookie headers.  Thanks to advertising and all the other
 system putting headers on your page you can't even properly control that
 one.


Yes, but it'd be relatively easy to handle this, especially since the raw
header isn't very useful.  So you just do
environ['HTTP_COOKIE'].encode('latin1').decode('utf8', 'replace') before
parsing.

Another thing to consider: in Python 3.1, the HTTP server internally
 decodes to latin1 and there is no simple way to change that, unless you
 replace the implementation.

  Ugh.  wsgi.input could remain.  I think at least it should become a
  file-like interface (i.e., giving an empty string when the content is
  exausted) and I might even ask that it implement .tell() (.seek() would
 be
  nice of course, but optional).  If there was some other idea, I think
  there's room for improvement on wsgi.input and the file interface.
 -1 on seek and tell.  This could be impossible to implement and what we
 really want to do is to not have the data in memory but on disk or
 whereever you put big-ass uploads.  Also it will be hard to test for an
 avaiable seek or not, because even if it's a noop, the method could be
 there.


Tell doesn't have particular overhead except to keep track of how many bytes
have been read.  That would allow libraries to at least detect contention
for wsgi.input.  I wish seek were detectable, though I agree it shouldn't be
required at all.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Ian Bicking
OK, I mentioned this in the last thread, but... I can't keep up with all
this discussion, and I bet you can't either.

So, here's a rough proposal for WSGI and unicode:

I propose we switch primarily to native strings: str on both Python 2 and
3.

Specifically:

environ keys: native
environ CGI values: native
wsgi.* (that is text): native
response status: native
response headers: native

wsgi.input remains byte-oriented, as does the response app_iter.

I then propose that we eliminate SCRIPT_NAME and PATH_INFO.  Instead we
have:

wsgi.script_name
wsgi.path_info (I'm not entirely set on these names)

These both form the original path.  It is not URL decoded, so it should be
ASCII.  (I believe non-ASCII could be rejected by the server, with Bad
Request?  A server could also choose to treat it as UTF8 or Latin1 and
encode unsafe characters to make it ASCII)  Thus to re-form the URL, you do:

environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] +
environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' +
environ['QUERY_STRING']

All incoming headers will be treated as Latin1.  If an application suspects
another encoding, it is up to the application to transcode the header into
another encoding.  The transcoded value should not be put into the environ.
In most cases headers should be ASCII, and Latin1 is simply a fallback that
allows all bytes to be represented in both Python 2 and 3.

Similarly all outgoing headers will be Latin1.  Thus if you (against good
sense) decide to put UTF8 into a cookie, you can do:

headers.append(('Set-Cookie', unicode_text.encode('UTF8').decode('latin1')))

The server will then decode the text as latin1, sending the UTF8 bytes.
This is lame, but non-ASCII in headers is lame.  It would be preferable to
do:

headers.append(('Set-Cookie', urllib.quote(unicode_text.encode('UTF8'

This sends different text, but is highly preferable.  If you wanted to parse
a cookie that was set as UTF8, you'd do:

parse_cookie(environ['HTTP_COOKIE'].encode('latin1').decode('utf8'))

Again, it would be better to do;

parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8'))

Other variables like environ['wsgi.url_scheme'], environ['CONTENT_TYPE'],
etc, will be native strings.  A Python 3 hello work app will then look like:

def hello_world(environ):
return ('200 OK', [('Content-type', 'text/html; charset=utf8')], ['Hello
World!'.encode('utf8')])

start_response and changes to wsgi.input are incidental to what I'm
proposing here (except that wsgi.input will be bytes); we can decide about
themseparately.



Outstanding issues:

Well, the biggie: is it right to use native strings for the environ values,
and response status/headers?  Specifically, tricks like the latin1
transcoding won't work in Python 2, but will in Python 3.  Is this weird?
Or just something you have to think about when using the two Python
versions?

What happens if you give unicode text in the response headers that cannot be
encoded as Latin1?

Should some things specifically be ASCII?  E.g., status.

Should some things be unicode on Python 2?

Is there a common case here that would be inefficient?



-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Ian Bicking
On Sun, Sep 20, 2009 at 8:06 AM, Armin Ronacher
armin.ronac...@active-4.com wrote:
 Thanks to Graham Dumpleton and Robert Brewer there is some serious
 progress on WSGI currently.  I proposed a roadmap with some PEP changes
 now that need some input.

 Summary:

  WSGI 1.0       stays the same as PEP 0333 currently is
  WSGI 1.1       becomes what Ian and I added to PEP 0333
  WSGI 2.0       becomes a unicode powered version of WSGI 1.1
  WSGI 3.0       becomes WSGI 2.0 just without start_response

  WSGI 1.0 and 1.1 are byte based and nearly impossible to use on Python
  3 because of changes in the standard library that no longer work with
  a byte-only approach.

1.1 I think of as an errata on 1.0, so... simple enough.

I was skeptical about a unicode version of WSGI, but I think I'm okay
with it now.  For people who use UTF-8-only it should be fairly simple
and easy; for people who want to deal with other encodings, backward
compatible URLs, or other weirdness I think surrogateescape can
resolve the small handful of problems.  Maybe an option to use latin1
(at the server level) would do the same for Python 2, as a deployment
option for people who are dealing with these tricky issues.  Which is
kind of lame, but it means everything is still *possible*, and the use
cases are somewhat obscure.  Especially because QUERY_STRING and
wsgi.input remain bytes.  (Well, I guess the other case would be
someone reading a cookie set by an application they do not control,
and set in a crazy way... but anyway, there's a handful of use cases
where things get tricky, but we can kind of punt, or try to implement
the necessary transcoding routines before the spec is final.)  I'm
very much opposed to a second raw version of the request, as I do
not like redundancy.

With respect to 3.0/start_response, I'd rather we just do both at
once, so there's not so many versions of WSGI to worry about.  Also it
doesn't feel like a very difficult change to make.

The only other major issue is wsgi.input, which is a quite awkward
interface to the request body.  But I think resolving that is harder
than start_response, in particular because there's no clear solution.
Maybe at least switching to a file interface would be better.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Ian Bicking
On Mon, Sep 21, 2009 at 6:16 PM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

  Of course you can directly use `environ['some_key']` if you know you'll
  get the 'right' encoding all the time. But when the encoding changes,
  you'll have to fix all your middlewares.
 
 
  I am missing something?

 For one, we aren't talking about arbitrary keys needing this treatment.

 We are only talking about SCRIPT_NAME and PATH_INFO.


OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and
introduce two equivalent variables that hold the NOT url-decoded values.  So
if you request /fran%e7cois then environ['PATH_INFO_RAW'] is '/fran%e7cois'.

This will be quite disruptive, as these are variables that are frequently
accessed directly (libraries that expose them as attributes can just turn
them into properties that do URL decoding, using UTF8).  But it's an easy
fix at least.  I would actually want to specify that if we added this key,
we should disallow the old keys -- terrible confusion could ensue from both
in the environ.  This also fixes the problem with not being able to
distinguish %2F from /, which isn't a big problem but is annoying, and is
hiding meaningful information.  (I believe the relevant spec does
distinguish between these two values -- i.e., ideally decoding should happen
on path segments, each segment separated by a real /.)

If we do that, then the only really tricky thing left is HTTP_COOKIE, and
since the Cookie header is a mess then HTTP_COOKIE will be a mess and we
just have to figure out a hacky way to deal with that.  Maybe
surrogateescape, but probably just Latin1 would be fine (and easy to do in
Python 2).

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Ian Bicking
On Tue, Sep 22, 2009 at 12:21 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 That may be fine for pure Python web servers where you control the
 split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first place
 but don't have that luxury in Apache or via FASTCGI/SCGI/CGI etc as
 that is done by the web server. Also, as pointed out in my blog,
 because of rewrites in web server, it may be difficult to try and map
 SCRIPT_NAME and PATH_INFO back into REQUEST_URI provided to try and
 reclaim original characters. There is also the problem that often
 FASTCGI totally stuffs up SCRIPT_NAME/PATH_INFO split anyway and
 manual overrides needed to tweak them.


When things get messed up I recommend people use a middleware
(paste.deploy.config.PrefixMiddleware, though I don't really care what they
use) to fix up the request to be correct.  Pulling it from REQUEST_URI would
be fine.
Also, at worst, you can do environ['SCRIPT_NAME_RAW'] =
urllib.quote(environ.pop('SCRIPT_NAME')).  It sucks, but if that's all the
information you have, then that's all the information you have.  Or try to
get the information from REQUEST_URI the hard way, once at the gateway
level.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Ian Bicking
On Tue, Sep 22, 2009 at 12:38 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 If doing something like you suggest, would prefer them as 'wsgi.'
 prefixed variables and not put in all upper case namespace to be
 confused with CGI variables etc.


I just had to make up a name, but I agree with your suggestion for wsgi.X
(we already have wsgi.url_scheme, after all).

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Unicode in Python 3

2009-09-19 Thread Ian Bicking
I can't read all this thread carefully, too much stuff.

I will note however that people are STILL ignoring surrogateescape
(http://www.python.org/dev/peps/pep-0383/).  This is like the third or
fourth time I've brought it up.  It was added to Python 3.1 for some
of the exact issues we are encountering.

Particularly, imagine someone requests /foo%efbar (which is not valid UTF-8).

 SCRIPT_NAME = b'/foo\xefbar' # after url unquoting (urllib.request.unquote 
 doesn't work for this currently)
 s = SCRIPT_NAME.decode('utf8', 'surrogateescape')
 s
'/foo\udcefbar'
 s.encode('utf8', 'surrogateescape')
b'/foo\xefbar'

So we can have unicode values that can be safely and correctly
transcoded to other encodings (or handled in their raw form).

The constraints on surrogateescape are:

* You have to use 'surrogateescape' during decoding and encoding (I
think for decoding it should be part of the spec)
* You have to know the encoding; doing s.encode('latin1',
'surrogateescape') wouldn't necessarily preserve the correct bytes (it
does for this example, but wouldn't if there was a mix of valid UTF-8
and invalid bytes)

And there's a bit of an annoyance to the fact that
SCRIPT_NAME/PATH_INFO should always be treated as UTF-8 (which might
sometimes be wrong, but for any modern app/browser will be right), but
maybe other parts (HTTP_COOKIE?) are in native encoding.  Well,
besides HTTP_COOKIE, I don't know what else would be in a different
encoding.  Atompub adds Slug, but it's a URL/IRI, so it should be
ASCII.  I have seen proposals for a Title header (e.g., when PUTting
an image and giving it a title), and that could be unicode.  But in
all those cases it'll be a modern app and modern clients, and in those
cases people just use UTF-8.

Frankly I'm open to UTF-8-everywhere.  People mentioned Jack and Rack,
and to what degree that works, it probably works because everyone uses
UTF-8.  With surrogateescape we allow transcoding when needed (e.g.,
if you wanted to handle redirects from old/weird non-UTF-8 URLs) but
keep things reasonably simple otherwise.

  Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] String Types in WSGI [Graham's WSGI for py3]

2009-09-18 Thread Ian Bicking
On Fri, Sep 18, 2009 at 2:56 AM, Graham Dumpleton
graham.dumple...@gmail.com wrote:
 As others have pointed out, the likes of rack and jack, not sure about
 the new Perl variant, don't seem to have an issue with using unicode.

I looked up Jack and Rack: http://jackjs.org/jsgi-spec.html and
http://rack.rubyforge.org/doc/files/SPEC.html

They don't have an issue with unicode because they don't mention it
and don't specify anything at all.  Basically they punt on the issue.

In the specific case, most things in Javascript have to be unicode.
The response body iterator must have items that respond to
toByteString, which includes String and Binary.  I'm assuming Strings
always use UTF8 in Javascript, as JSON acts that way.  jsgi.input is
only specified as an input stream, which is very unspecified.
Especially since jsgi.errors is an output stream, though presumably
one should be binary and the other text.

Ruby's unicode is kind of funny (as I understand it), in a way that
might help them.  Strings are stored as binary with an attached
encoding.  So there's no unicode, only binary strings with
encodings; so you can change the encoding, or transcoding happens
implicitly when you combine strings from different encodings.  So
basically there's no mention of unicode because they've dodged that
whole bullet.  But it also seems to be unspecified what encoding might
be attached to strings, if any at all.

Another example, neither spec even indicates if SCRIPT_NAME/PATH_INFO
are url-decoded (or that they aren't decoded).  So, in summary: I
don't see anything we can learn from these specs, and there's no
reason we should feel like we've somehow been leapfrogged, instead
these other specifications are underspecified.  I also think on
Web-SIG we are approaching this with more robust and general
applications in mind than for Jack and Rack -- for instance, I would
like WSGI to be a reasonable basis for an HTTP proxy, where you can't
enforce UTF8-everywhere.  If all we wanted for WSGI was to be a layer
for serving monolithic applications then these issues wouldn't be so
important.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets

2009-09-18 Thread Ian Bicking
On Fri, Sep 18, 2009 at 5:07 PM, P.J. Eby p...@telecommunity.com wrote:
 On his blog, Graham mentioned some skepticism about skipping WSGI 1.1 and
 going straight to 2.0, due to concern that people using write() would need
 to make major code changes to go to WSGI 2.0.

I'm not entirely clear why this is such a big deal.  Here's how I'd
implement a WSGI 2 wrapper around a WSGI 1 app:

def wsgi1to2(app):
def new_app(environ):
written = []
status_headers = []
def start_response(status, headers, exc_info=None):
if exc_info is not None:
raise exc_info[0], exc_info[1], exc_info[2]
status_headers[:] = [status, headers]
return written.append
app_iter = app(environ, start_response)
if not status_headers:
app_iter = iter(app_iter)
written.append(app_iter.next())
assert status_headers
if written:
app_iter = itertools.chain(written, app_iter)
return status_headers[0], status_headers[1], app_iter


What's wrong with this simpler approach to the conversion?


-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets

2009-09-18 Thread Ian Bicking
On Fri, Sep 18, 2009 at 7:09 PM, Armin Ronacher
armin.ronac...@active-4.com wrote:
 Ian Bicking schrieb:
 What's wrong with this simpler approach to the conversion?
 It buffers, you can no longer do this:

   request.write('processing data')
   request.flush()
   ...
   request.write('data processed')
   request.flush()

 But that's not too common and people should rather rewrite their
 applications to use generators for these cases.

Yes -- I don't think many (any?) people use this particular technique,
though many people use the start_response writer simply because it was
there and it seemed like a good idea.  I even used it a few times
because it was easier to code for some circumstances (e.g.,
paste.cgiapp) but not because I expected it would immediately be
pushed to the client.  (appengine's webapp framework uses it a lot,
not entirely sure why; not for streaming though -- maybe because it
pushes the bytes out of the Python interpreter and into the parent
process faster)

So, I'm just saying we need to handle the start_response writer,
because people have used it, but I'm not aware of people using it for
its intended purpose.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI and async Servers

2009-09-17 Thread Ian Bicking
On Thu, Sep 17, 2009 at 11:40 AM, Armin Ronacher
armin.ronac...@active-4.com wrote:
 Why would it be good to encourage async applications on top of WSGI?
 Because people would otherwise come up with their own implementations
 that are incompatible to each other.  Maybe that should not go into WSGI
 but a AWSGI or whatever, but I'm pretty sure we should at least consider
 it and ask people that use asynchronous applications/servers what the
 issues with WSGI are.

I think AWSGI would be most appropriate.  There's too much going on,
and trying to keep WSGI sane while allowing async is just too hard.
If we fork, then people can get something that really works well, they
can try it out with real applications, and then maybe we can look at
something we know works and see if AWSGI/WSGI differences can be
resolved to bring it back into one spec.  And indeed it's quite
possible at the library level that AWSGI could be supported by other
libraries; I'm guessing for instance that WebOb would just require a
few checks around the request body, and probably the response would
work relatively fine (but for many patterns a normal response object
would not be sufficient in an async context -- but that's fine too).

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]

2009-09-17 Thread Ian Bicking
On Thu, Sep 17, 2009 at 10:57 AM, Armin Ronacher
armin.ronac...@active-4.com wrote:
 I just want to point out that these are in no way final and are further
 intended to only clarify some of the wrong wordings for Python 2, give
 us a real readline() function on the input stream and get rid of useless
 old cruft such as Python 2.2 support and Jython compatibility which no
 longer appears to be a problem.

To reiterate: people have complained that we've discussed
non-controversial changes to WSGI, but the spec hasn't been updated.
This was in large part, I think, because no one took the step going
from discussion to actual proposed PEP changes.  So these are some
proposed changes, intended to be conservative.  They are meant to be
conservative, more like errata than a real revision, and to reflect
current WSGI practice.  If someone thinks one of the changes goes too
far, then we can discuss -- I think we'll just be more constructive if
we stick to concrete changes to the PEP so we can easily implement
what we all agree on.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2

2009-08-11 Thread Ian Bicking
On Tue, Aug 11, 2009 at 11:19 PM, Robert Brewer fuman...@aminus.org wrote:

   5. When running under Python 3, servers MUST provide CGI HTTP and
  server variables as strings. Where such values are sourced from a byte
  string, be that a Python byte string or C string, they should be
  converted as 'UTF-8'. If a specific web server infrastructure is able
  to support different encodings, then the WSGI adapter MAY provide a
  way for a user of the WSGI adapter to customise on a global basis, or
  on a per value basis what encoding is used, but this is entirely
  optional. Note that there is no requirement to deal with RFC 2047.

 We're passing unicode for almost everything.

 REQUEST_METHOD and wsgi.url_scheme are parsed from the Request-Line, and
 must be ascii-decodable. So are SERVER_PROTOCOL and our custom
 ACTUAL_SERVER_PROTOCOL entries.

 The original bytes of the Request-URI are stored in REQUEST_URI. However,
 PATH_INFO and QUERY_STRING are parsed from it, and decoded via a
 configurable charset, defaulting to UTF-8. If the path cannot be decoded
 with that charset, ISO-8859-1 is tried. Whichever is successful is stored at
 environ['REQUEST_URI_ENCODING'] so middleware and apps can transcode if
 needed. Our origin server always sets SCRIPT_NAME to '', but if we populated
 it, we would make it decoded by the same charset.


My understanding is that PATH_INFO *should* be UTF-8 regardless of what
encoding a page might be in.  At least that's what I got when testing
Firefox.  It might not be valid UTF-8 if it was manually constructed, but
then there's little reason to think it is valid anything; only the bytes or
REQUEST_URI are likely to be an accurate representation.  (Frankly I wish
PATH_INFO was not url-decoded, which would remove this issue entirely --
REQUEST_URI, or any url-encoded value, should really be ASCII, and I don't
know of reasonable cases where this wouldn't be true.)

I suppose ISO-8859-1 is a reasonable fallback in this case, as it can be
used to kind of reconstruct the original request path (the surrogateescape
or whatever it is called would serve the same purpose, but is only available
in Python 3).

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 333 and gzipping of responses

2009-08-10 Thread Ian Bicking
On Mon, Aug 10, 2009 at 9:11 PM, James Bennett ubernost...@gmail.comwrote:

 Earlier today I posted an article on my blog following up on some
 discussions of WSGI; one criticism presented was of language in PEP
 333 regarding gzipping of responses by WSGI applications. Ian posted a
 comment which stated that the criticism was not correct, but I'm at a
 loss to figure out what *is* correct, so I'll bring up the question
 here.

 In a parenthetical at the end of the section entitled Handling the
 Content-Length Header, PEP 333 states:

  Note: applications and middleware must not apply any kind of
  Transfer-Encoding to their output, such as chunking or gzipping; as
  hop-by-hop operations, these encodings are the province of the
  actual web server/gateway. See Other HTTP Features below, for more
  details.

 In the section Other HTTP Features, PEP 333 states, in part:

  However, because WSGI servers and applications do not communicate
  via HTTP, what RFC 2616 calls hop-by-hop headers do not apply to
  WSGI internal communications. WSGI applications must not generate
  any hop-by-hop headers [4], attempt to use HTTP features that
  would require them to generate such headers, or rely on the content
  of any incoming hop-by-hop headers in the environ dictionary.

 My criticism of this is that this is at best ambiguous, and quite
 possibly openly misleading to readers of the PEP.

 The ambiguity here is that gzip is a valid value for the
 Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41),
 but is also a valid value for the Content-Encoding header (RFC 2616,
 Sections 3.5 and 14.11).


I just don't get the confusion.  Transfer-Encoding is not allowed in WSGI (a
hop-by-hop header, like several other Transfer-* headers).  Content-Encoding
is allowed, because everything not specifically mentioned is allowed.
 Clearly Content-Encoding and Transfer-Encoding are different strings.
 And, as you mention, the normal thing that people currently do is use
Content-Encoding anyway, so since people aren't using Transfer-Encoding, why
is this controversial?

There are some weird implications to using Content-Encoding, specifically
ETags and range requests, but eh... those exist in mod_deflate and just
about everywhere, and are mostly outside the scope of WSGI.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WSGI 2

2009-08-03 Thread Ian Bicking
So... what about WSGI 2?  Let's not completely drop the ball on this.
I *think* we were largely in agreement; debate got distracted by some
async stuff, but I don't think we particularly have to deal with that
for WSGI 2.  I think we do more than enough if we figure out: WSGI in
Python 3, i.e., with unicode; some basic errata kind of stuff, like
readline signature; change the callable signature to remove
start_response.

Would this be a new PEP or a revision?  I think it should be a new
PEP, as WSGI 1 remains valid and the same as it always was, and PEP
333 describes that.  Is there anyone willing to make the revisions?

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-05-05 Thread Ian Bicking
On Tue, May 5, 2009 at 10:14 PM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 2009/5/6 Ian Bicking i...@colorstudy.com:
  Philip Jenvey brought this to my attention:
 
http://www.python.org/dev/peps/pep-0383/
 
  It's a UTF8 encoding and decoding scheme that encodes illegal bytes in
 such
  a way that you can decode to get the original bytes object, and thus
  transcode to another encoding.  It's intended for cases exactly like
 WSGI.

 Care to explain then how that would in practice be used while I try
 and reread it a few times to try and understand it myself? :-)


I don't particularly know, except I think you'd do things like:

environ['PATH_INFO'] = urllib.unquote(http_byte_path).decode('utf8',
'python-escape')

Then if the encoding was wrong, you could transcode like:

environ['PATH_INFO'] = environ['PATH_INFO'].encode('utf8',
'python-escape').decode('latin1', 'python-escape')

Note that you need to know the encoding that was used (utf8 in this case)
and that python-escape was used.  It has been suggested that the server
should put the encoding it used into the environment.  When transcoding this
should also be updated.

It's not clear what python-escape is going to do, I don't think that's been
determined.  Probably it'll put \x00 or something in the unicode string to
mark raw bytes.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words

2009-04-08 Thread Ian Bicking
On Wed, Apr 8, 2009 at 1:14 PM, James Y Knight f...@fuhm.net wrote:

 If you want to start a discussion about having a standard parsed-header
 object in WSGI, that's another thing,


Off topic to this discussion, but that's what WebOb is.  It also largely
handles the encoding issues, abstracts away the awkwardness of the WSGI call
signature, and also does header parsing.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Reverse Proxy HTTPS

2009-04-06 Thread Ian Bicking
A last note: paste.deploy.config.PrefixMiddleware does some fixup for cases
like this, including looking at X-Forwarded-Scheme and X-Forwarded-Proto for
the protocol (both names, because there's nothing approaching consensus on
what to name these headers).


2009/4/6 Randy Syring ra...@rcs-comp.com

  Graham,

 Excellent, thank you!  That confirms for me the concept is correct, now all
 I have to do is work on an IIS implementation.  FUN!

 --
 Randy Syring
 RCS Computers  Web Solutions
 502-644-4776http://www.rcs-comp.com

 Whether, then, you eat or drink or
 whatever you do, do all to the glory
 of God. 1 Cor 10:31



 Graham Dumpleton wrote:

 Using nginx as front end to Apache/mod_wsgi as an example:

 On nginx side you would use:

   proxy_set_header X-Url-Scheme $scheme;

 and on Apache/mod_wsgi side, with Django 1.0 as an example, in WSGI
 script file we would have:

   import os, sys
   sys.path.append('/usr/local/django')

   os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'

   import django.core.handlers.wsgi

   _application = django.core.handlers.wsgi.WSGIHandler()

   def application(environ, start_response):
 environ['wsgi.url_scheme'] = environ.get('HTTP_X_URL_SCHEME', 'http')
 return _application(environ, start_response)

 Is the equivalent on IIS side as others have mentioned that you need.

 Graham

 2009/4/7 Paweł Stradomski pstradom...@gmail.com pstradom...@gmail.com:


  W liście Randy Syring z dnia poniedziałek, 6 kwietnia 2009:



  I would like my application to have control over the HTTPS-HTTP
 redirects and would rather not force that logic into the forward facing
 web server if at all possible.  That just seems like an extra
 configuration step that wouldn't necessarily be needed if I could figure
 out how to pass SSL status from the forward facing web server to the
 backend proxy (i.e. CherryPy and my app).

 So, do you (or anyone else) know of a good way to to this?  Or, does
 everyone just assume that it is all or nothing for SSL when you are
 proxying to a backend?



  Check with IIS manual, it should be possible to set some nonstandard header
 when the connection goes through SSL, and then check this header in your
 application. Maybe that header is already there - write a simple controller
 that prints all the headers from the request and check how it looks with and
 without SSL (but verify with the IIS manual anyway).

 --
 Paweł Stradomski
 ___
 Web-SIG mailing listweb-...@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

  ___
 Web-SIG mailing listweb-...@python.org
 Web SIG: http://www.python.org/sigs/web-sig

 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com


 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com




-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] thoughts on an iterator

2009-03-30 Thread Ian Bicking
2009/3/28 Robert Brewer fuman...@aminus.org:
 H. Graham brought up chunked requests which I don't think have much
 bearing on this issue--the server/app can't rely on the client-specified
 chunk sizes either way (or you enable a Denial of Service attack). I
 don't see much difference between the file approach and the iterator
 approach, other than moving the read chunk size from the app (or more
 likely, the cgi module) to the server. That may be what kills this
 proposal: cgi.FieldStorage expects a file pointer and I doubt we want to
 either rewrite the entire cgi module to support iterators, or re-package
 the iterator up as a file.

There are some alternate implementations of the cgi POST-parsing
functionality, some of which might be more amenable to using an
iterator.  Or for that matter, none of us have probably read the cgi
module with this in mind.  With a quick look, it'll be slightly tricky
because it uses .readline a lot, but there's just not that much code
involved so it can't be too hard.

For clarity, I think everyone has been discussing an *iterator*, not
an iterable; an iterable would have a lot of unnecessary overhead, but
I've seen both terms used.

I don't agree with Graham's objection, as I think the reason to read
specific-sized chunks is that you don't want to read too much data
into memory at one time.  But the server is free to chunk the iterator
to avoid too much data, and once the strings are in memory the
consumer really isn't any better off reading a smaller chunk than what
is available.

This also means I can stop making up entirely random chunk sizes in
applications.  Applications have no real information to inform this
chunking.  If the string is already in memory, the chunking actually
is counterproductive.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] how to test hunging socket ?

2009-01-30 Thread Ian Bicking
If you use the Paste HTTP server and Python 2.5 with ctypes installed, you
can install the watchthreads app:
http://svn.pythonpaste.org/Paste/trunk/paste/debug/watchthreads.py

that will let you see the hung threads, and get a traceback of their current
position.

On Fri, Jan 30, 2009 at 1:52 PM, William Dode w...@flibuste.net wrote:

 Hi,

 I've a problem with a web app wich freeze periodicaly. I monitored my
 app and the hang doesn't seem to occur in it. So i think the problem is
 before, or after, a problem of socket i imagine... It append with
 wsgiref.simple_server and mod_wsgi. My app is not totaly thread safe so
 i didn't try a lot of servers...
 When it freeze, i have to restart the app manualy. With mod_wsgi it
 freeze the whole server. It doesn't append very often so it's difficult
 for me to reproduce the problem.

 So my question is, how can i simulate hunging socket ? or how can i see
 where the app freeze exactly ?

 In python-paste server i read the ian tried to handle some case of
 hunging socket...

 thx, and sorry for my english...

 --
 William Dodé - http://flibuste.net
 Informaticien Indépendant

 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com




-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] how to test hunging socket ?

2009-01-30 Thread Ian Bicking
On Fri, Jan 30, 2009 at 3:48 PM, William Dode w...@flibuste.net wrote:

 Fine, i should definitely give it a try.

 If my app is not thread safe but respond in a decent time, can i benefit
 from a multithread server (for a socket problem) if i use a lock for
 every page like that :

 I use webob...


If your app isn't threadsafe, you should use a multiprocess server.
mod_wsgi has options for this, and flup has forking options (you'd use flup
behind Apache or another server).


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiref.validate allows wsgi.input.read() with no argument

2008-12-13 Thread Ian Bicking

Graham Dumpleton wrote:

An application that relies on the server to simulate end-of-file will be a
broken application on some servers.  This is not an uncommon problem.
 Therefore the validator tests for this case; if you want an application
that actually works consistently, you shouldn't do
environ['wsgi.input'].read().


The validator does not test for that case, that is what I am pointing
out. The validator allows read() to be called with no argument.


Ah, sorry, I wasn't paying attention... okay, then yes, I agree -- the 
validator should be more restrictive.


--
Ian Bicking : i...@colorstudy.com : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiref.validate allows wsgi.input.read() with no argument

2008-12-12 Thread Ian Bicking

Graham Dumpleton wrote:

Just noticed that although WSGI PEP doesn't specifically mention that
argument to read() on wsgi.input is optional, wsgiref.validate allows
calling read() with no argument.


From wsgiref.validate:



* That wsgi.input is used properly:

  - .read() is called with zero or one argument

class InputWrapper:

def read(self, *args):
assert_(len(args) = 1)
v = self.input.read(*args)
assert_(type(v) is type())
return v


Of course, the issue is still that WSGI PEP says:

The server is not required to read past the client's specified
Content-Length, and ***is allowed to simulate an end-of-file condition
if the application attempts to read past that point***.


An application that relies on the server to simulate end-of-file will be 
a broken application on some servers.  This is not an uncommon problem. 
 Therefore the validator tests for this case; if you want an 
application that actually works consistently, you shouldn't do 
environ['wsgi.input'].read().



--
Ian Bicking : i...@colorstudy.com : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-17 Thread Ian Bicking

Mark Ramm wrote:

On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover [EMAIL PROTECTED] wrote:

Ian Bicking wrote:


To resolve this, let's just not pass it over this time?


Totally agreed.

What exactly needs to happen next?


We need to propose a change to the WSGI specification.  I propose, in 
Input and Error Streams 
(http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
change it to have readline(hint) and expand Note 3 to include readline 
as well as readlines, removing Note 2.  Also I suppose some sort of 
change note in the specification?


Does this sound like a sufficient change to the spec, and are there any 
objections to the change?


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-17 Thread Ian Bicking

Manlio Perillo wrote:

Ian Bicking ha scritto:

[...]
We need to propose a change to the WSGI specification.  I propose, in 
Input and Error Streams 
(http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
change it to have readline(hint) and expand Note 3 to include 
readline as well as readlines, removing Note 2.  Also I suppose some 
sort of change note in the specification?


Does this sound like a sufficient change to the spec, and are there 
any objections to the change?




Fine for me, but of course we need to do this as:
1) Errata to WSGI 1.0
or
2) WSGI 1.1
or
3) WSGI 2.0

You can't just modify the current WSGI 1.0 spec.

I'm for 2), with the other clarifications about WSGI we have discussed 
in the past.


I'm for 1.  What other clarifications were you thinking of?


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-16 Thread Ian Bicking

Graham Dumpleton wrote:

2008/11/16 Ian Bicking [EMAIL PROTECTED]:

We need to make a revision to the WSGI spec to say that
environ['wsgi.input'].readline takes an optional size argument.  It always
does in practice (except in wsgiref.validate.validator, rendering that
validator useless), and is required to in practice, because everyone uses
cgi.FieldStorage, and it passes in that argument.


This has been brought up numerous times before. There are other things
about wsgi.input that really need to be changed as well to make it
more useful. When I have pushed for revised specification before I
could never get enough interest in it from the people that most would
perceive are the ones who oversee the PEP.


Yes, this has been passed over before.  To resolve this, let's just not 
pass it over this time?  This is a relatively small change to the WSGI 
spec, because it represents standard practice -- this change is simply 
getting the spec in line with implementations.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-14 Thread Ian Bicking

Andrew Clover wrote:

Ian Bicking wrote:

As it is (in Python 2), you should do something like 
environ['PATH_INFO'].decode('utf8') and it should work.


See the test cases in my original post: this doesn't work universally. 
On WinNT platforms PATH_INFO has already gone through a decode/encode 
cycle which almost always irretrievably mangles the value.


This is something messed up with CGI on NT, and whatever server you are 
using, and perhaps the CGI adapter (maybe there's a way to get the raw 
environment without any encoding, for example?) -- it's mostly 
irrelevant to WSGI itself.


My understanding of this suggestion is that latin-1 is a way of 
representing bytes as unicode. In other words, the values will be 
unicode, but that will simply be a lie.


Yes, that would be a sensible approach, but it is not what is actually 
happening in any WSGI environment I have tested. For example 
wsgiref.simple_server decodes using UTF-8 not 8859-1 — or would do, if 
it were working. (It is currently broken in 3.0rc2; I put a hack in to 
get it running but I'm not really sure what the current status of 
simple_server in 3.0 is.)


As far as I know, PJE just made the suggestion about Latin-1, I don't 
know if anything has actually been done in wsgiref or elsewhere to 
implement that.  Honestly I don't know if anyone is doing anything with 
WSGI and Python 3.


A lot of what you write about has to do with CGI, which is the only 
place WSGI interacts with os.environ.  CGI is really an aspect of the 
CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the 
WSGI spec itself.


Indeed, but we naturally have to take into account implementability on 
CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using 
8859-1 decoding — or UTF-8, which is the other sensible option given 
that most URIs today are UTF-8 — then there cannot be a fully-compliant 
CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was 
first getting off the ground, but IMO it's still important.


This will presumably require hacks that might be system-dependent. 
Probably the current CGI adapter will just have to be a bit more 
complicated.  Also, if Python is utf8-decoding the environment, we'll 
just have to shortcut that entirely, as you can't just undo utf8.  I 
assume there is some way to get at the bytes in the environment, if not 
then that is a Python 3 bug.


Personally I'm more inclined to set up a policy on the WSGI server 
itself with respect to the encoding, and then use real unicode 
characters.


I think we are stuck with Unicode environ at this point, given the CGI 
issue. But applications do need to know about the encoding in use, 
because they will (typically) be generating their own links. So an 
optional way to get that information to the application would be 
advantageous.


The encoding of the operating system (which presumably informs the 
encoding of os.environ) has nothing to do with the encoding of the web 
application.  For the CGI adapter we simply need to find a way to ignore 
the system encoding.


I'm now of the opinion that the best way to do this is to standardise 
Apache's ‘REQUEST_URI’ as an optional environ item. This header is 
pre-URI-decoding, containing only %-sequences and not real high bytes, 
so it can be decoded to Unicode using any old charset without worry.


Unfortunately REQUEST_URI doesn't map directly to SCRIPT_NAME/PATH_INFO. 
 I think it might be feasible to support an encoded version of 
SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, 
and I don't know of any particular standard to base those names on), 
moving from the two keys to a single REQUEST_URI is not feasible.


It's not that trivial to figure out where in REQUEST_URI the 
SCRIPT_NAME/PATH_INFO boundary really is, as there's many ways the 
unencoded values could be encoded.  I guess you'd probably count 
segments, try to catch %2f (where the segments won't match up), and then 
double check that the decoded REQUEST_URI matches SCRIPT_NAME+PATH_INFO.


An application wanting to support Unicode URIs (or encoded slashes in 
URIs*) could then sniff for REQUEST_URI and use it in preference to 
PATH_INFO where available. This is a bit more work for the application, 
but it should generally be handled transparently by a library/framework 
and supporting PATH_INFO in a portable fashion already has warts thanks 
to IIS's bugs, so the situation is not much worse than it already is.


I use the distinction between SCRIPT_NAME and PATH_INFO extensively. 
And frankly IIS is probably less relevant to most developers than CGI. 
Anyway, any of these bugs are things that need to be fixed in the WSGI 
adapter, we must not let them propagate into the specification or 
applications.  So if IIS has problems with PATH_INFO, the WSGI adapter 
(be it CGI or otherwise) should be configured to fix those problems up 
front.


And of course we get support through mod_cgi and mod_wsgi automatically, 
so

Re: [Web-SIG] WSGI Amendments thoughts: the horror of charsets

2008-11-12 Thread Ian Bicking

Andrew Clover wrote:
If we could reliably read the bytes the browser sends to us in the GET 
request that would be great, we could just decode those and be done with 
it. Unfortunately, that's not reliable, because:


1. thanks to an old wart in the CGI specification, %XX hex escapes are 
decoded before the character is put into the PATH_INFO environment 
variable;


I don't see a problem with this?  At least not a problem with respect to 
encoding.  As it is (in Python 2), you should do something like 
environ['PATH_INFO'].decode('utf8') and it should work.  It doesn't seem 
like there's any distinction between %-encoded characters and plain 
characters in this situation.



2. the environment variables may be stored as Unicode.

(1) on its own gives us the problem of not being able to distinguish a 
path-separator slash from an encoded %2F; a long-known problem but not 
one that greatly affects most people.


But combined with (2) that means some other component must choose how to 
decode the bytes into Unicode characters. No standard currently 
specifies what encoding to use, it is not typically configuarable, and 
it's certainly not within reach of the WSGI application. My assumption 
is that most applications will want to end up with UTF-8-encoded URLs; 
other choices are certainly possible but as we move towards IRI they 
become less likely.



This situation previously affected only Windows users, because NT 
environment variables are native Unicode. However, Python 3.0 specifies 
all environment variable access is through a Unicode wrapper, and gives 
no way to control how that automatic decoding is done, leaving everyone 
in the same boat.


WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ 
should be decoded from the headers using HTTP standard encodings (i.e. 
latin-1 + RFC 2047), but unfortunately this doesn't quite work:


My understanding of this suggestion is that latin-1 is a way of 
representing bytes as unicode.  In other words, the values will be 
unicode, but that will simply be a lie.  So if you know you have UTF8 
paths, you'd do:


path_info = environ['PATH_INFO'].encode('latin-1').decode('utf8')

As far as I can tell this is simply to avoid having bytes in the 
environment, even though bytes are an accurate representation and 
unicode is not.


A lot of what you write about has to do with CGI, which is the only 
place WSGI interacts with os.environ.  CGI is really an aspect of the 
CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI 
spec itself.


Personally I'm more inclined to set up a policy on the WSGI server 
itself with respect to the encoding, and then use real unicode 
characters.  Unfortunately that's not as flexible as bytes, as it 
doesn't make it very easy to sniff out the encoding in 
application-specific ways, or support different encodings in different 
parts of the server (which would be useful if, for instance, you were to 
proxy applications with unknown encodings).  So... maybe that's not the 
most feasible option.  But if it's not, then I'd rather stick with bytes.



--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Ian Bicking

Manlio Perillo wrote:

Hi.

In my WSGI framework:
http://hg.mperillo.ath.cx/wsgix

I have, in the `http` module, the functions `parse_query_string` and
`parse_simple_post_data`.

The first parse the query string and return a dictionary of strings, the
latter parse the application/x-www-form-urlencoded client body and
return a dictionary of strings and the charset used by the client for
the unicode encoding.


Now, I'm thinking if these two function should instead return Unicode
strings instead of plain strings.

I think that Unicode strings should be returned, but I would like to
know what other web frameworks do.

Django seems to convert to Unicode, but the Python standard library does 
not (and I would like to know if changes are planned for Python 3.x).


WebOb decodes to request data to str, then lazily decodes to unicode 
based on the request encoding.  The request encoding is a bit fuzzy to 
calculate, which is part of why the decoding is lazy, so that the 
request encoding can be set or changed at any time.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Using decorators to add objects in a thread-local store..

2008-07-15 Thread Ian Bicking

Etienne Robillard wrote:


Hi all,

I'd like to have your input and comments on using decorators
functions for adding extra options to the request.environ object.

For instance, here's a decorator whichs adds a scoped session
object into request.environ:

def with_session(engine=None):

Decorator function for attaching a `Session` instance
as a keyword argument in `request.environ`.

def decorator(view_func):
def _wrapper(request, *args, **kwargs):
scoped_session.set_session(engine)
request.environ['_scoped_session'] = getattr(scoped_session, 'sessio


You should always use a namespace, e.g., 
request.environ['something._scoped_session'] = ...


In the context of a Pylons controller you could do it this way.  Of 
course with just WSGI it would be better to wrap it via WSGI, which is 
almost equivalent to a decorator:


def with_session(engine=None):
def decorator(app):
def engine_wsgi_app(environ, start_response):
environ['...'] = ...
return app(environ, start_response)
return engine_wsgi_app
return decorator


Pylons controllers aren't *quite* WSGI applications, but instances of 
those controller classes are.  So wrapping an individual controller with 
middleware requires a bit more work.



return view_func(request, *args, **kwargs)
return wraps(view_func)(_wrapper)
return decorator

Then it can be used as follows:

@with_session(engine=engine):
def view_blog_list(request, *args, **kwargs):
# get the local session object for this
# request (thread-local)
sess = request.environ['_scoped_session']
# do stuff with the Session object here...
...

Is this a good approach, or can this be adapted to work
in multithreaded environments ?


Since you are passing around arguments to functions it should be fine in 
threaded environments.




--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-08 Thread Ian Bicking

Phillip J. Eby wrote:

 Obviously plenty of
people have a desire to have a place to store request-local data
without passing the environment everywhere. Using threading.local is a
good way to do that, unless the server is not using one thread per
request. Giving people an interface to write to that doesn't
specifically mention threads and is customizable by the wsgi server is
what I am suggesting.


Er, and how do you propose people *access* that interface rather than a 
specific implementation of it?  Wouldn't we need to pass it in the 
environ, thereby rendering the whole thing even more obviously moot?  :)


I can't decide what the question is here.  You mean, how can a greenlet 
request-local provider indicate that they are providing a way of getting 
the current request?  Or, how can a consumer get access, given that it 
can live in any module, and the consumer presumably doesn't have an environ?


I imagine from what Donovan says that there would actually be one 
module, requestlocal, and one implementation, and that implementation 
would be awesome and support greenlets and threads, and whatever else 
comes along (which luckily is not much else), and I guess maybe has a 
middleware that would register the request on entry and deregister it on 
exit, and consumers would do:


  import requestlocal

  def whatever():
  environ = requestlocal.get_request()

and we'd just all agree on this singular implementation, because I don't 
see any way around that.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-07 Thread Ian Bicking

Manlio Perillo wrote:

Ian Bicking ha scritto:

Manlio Perillo wrote:


I'm adding web-sig in Cc.


 [...]

I'm developing a WSGI framework with all these (and other) ideas:
http://hg.mperillo.ath.cx/wsgix

Its still not documented, so I have not yet made an official 
announcement.


The main design goal is to keep the level of the interface as low 
level as possible.


I don't like additional interfaces (like Request and Response) 
objects around the WSGI dictionary, and I don't like frameworks like 
Django that completely hides the WSGI interface.


Have you tried webob?  My first run as Paste avoided wrappers around 
those objects, but an object interface has been very helpful.




I have not tried it, but I have read the code (as I have read the code 
of Paste).


In principle I'm against using additional interface, and one of the 
reason I wrote wsgix is to have a prof of concept, for trying to 
understand if it is feasible to write a WSGI application using an 
alternative framework.


wsgix (+ mod_wsgi for Nginx) has the same role as Paste, but I have 
decided to use a rather different approach.


As an example, in Paste you have choosed to using config dictionary for 
middleware configuration, that is, you have middleware factories.


I think this is a red herring.  WebOb specifically doesn't do anything 
related to configuration or the setup of the stack.  What it does do is 
stuff like:


expires = http.format_time(0)
http.generate_cookie(
environ, headers, name, '', expires=expires,
domain=cookie_domain(environ), path=path,
max_age=0)

which would be resp.delete_cookie(name) (well, cookie_domain seems to be 
derived from a setting, but that's mostly unrelated).  This isn't a 
particularly substantial difference, but these small conveniences add up.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-07 Thread Ian Bicking

Manlio Perillo wrote:

Ian Bicking ha scritto:

Manlio Perillo wrote:
[...]


As an example, in Paste you have choosed to using config dictionary 
for middleware configuration, that is, you have middleware factories.


I think this is a red herring.  WebOb specifically doesn't do anything 
related to configuration or the setup of the stack.  What it does do 
is stuff like:


expires = http.format_time(0)
http.generate_cookie(
environ, headers, name, '', expires=expires,
domain=cookie_domain(environ), path=path,
max_age=0)

which would be resp.delete_cookie(name) (well, cookie_domain seems to 
be derived from a setting, but that's mostly unrelated).  This isn't a 
particularly substantial difference, but these small conveniences add up.




As I have said, this is a personal taste, I don't like the 
architecture used by WebOb and prefer to directly use the environ 
dictionary without introducing other abstractions.

This is possible, I'm writing a not simple application using wsgix.


I'm still evaluating if I can reuse WebOb parsing functions (and this 
would be a great thing: I think that we *really* need a package with 
*only* low *level* parsing functions for the HTTP protocol).


 From what I can see, WebOb *does* not offer a low level interface for 
the parsers: you *have* to use the Request object.


I really like multilevel architectures, instead.


This was the deliberate approach of Paste, and it does have several 
functions for doing things similar to how you describe.  As I said, I 
went down exactly this path, but I think WebOb solves the problem 
better.  You can think of WebOb as a way of currying functions.  All the 
request functions take an environ argument, curried through 
instantiation of webob.Request.  All response functions take 
status/headers/app_iter, curried through webob.Response.  State is never 
held outside the environment or the status/headers/app_iter of the response.


So think of webob.Request as the module of request-parsing routines, and 
webob.Response as the module of response-parsing routines.  (There are 
underlying functions for things like parsing dates, but they are only 
exposed through those classes.)


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-07 Thread Ian Bicking

Donovan Preston wrote:
To throw another wrench in things, with the Paste/WebError 
evalexception interactive exception handler, it restores this 
thread-local context so you can later execute expressions in the same 
context.


It seems to me that what is really needed here is an extension of wsgi 
that specifies how to get, set, and list request local storage, and for 
people to use that instead of the threadlocal module. Of course, for 
threaded servers, they will just use the threadlocal module, but for 
Spawning running in single-threaded cooperative mode it would use a 
greenlet-local implementation, and for a hypothetical Twisted server 
running a hypothetical asynchronous wsgi application it would just use a 
random request id.


Well, it's really call-local, i.e., dynamic scoping.  Another option 
would be something like attaching this dynamic scoping to the frame 
objects themselves, in a way that evalexception could be aware 
(restoring them when trying to execute code in the context of some 
frame) and potentially greenlets could do the same thing.


It could be done in a WSGI-specific way, and that might be useful, but 
the general issue is applicable to more than WSGI.


Generally the problems we are talking about only occur when some kind of 
(semi-)transparent concurrency other than threads are used.  This 
includes greenlets, restoring a frame like in evalexception, and 
potentially generators with the app_iter.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] help with the implementation of a WSGI middleware

2008-07-07 Thread Ian Bicking

Phillip J. Eby wrote:

At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote:
In this case the first solution is to use this middleware as a 
decorator, instead of a full middleware.


This is the correct way to implement non-transparent middleware; i.e., 
so-called middleware which is in fact an application API.  See:


http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html

for more about this.

Basically, if a piece of middleware has to be there for the application 
to run, it's not really middleware; it's a misnamed decorator.


In the original WSGI spec, I overestimated the usefulness of adding 
extension APIs to the environ... or more likely, I went along with some 
of Ian's overenthusiasm for the idea.  ;-)  Extension APIs in the 
environ just mean you have to write your code to handle the case where 
the API isn't there -- in which case you might as well have used a library.


Eh, personally I remain unconvinced.  Or, at least, while the 
possibility of abuse exists, the extensibility still has many valid 
uses, and we're better off with it than with a more object-based system 
(e.g., CherryPy hooks, Django middleware, Zope's Acquisition, and 
arguably even Zope 3's giant-ball-of-context).


Also, using a *just* library supposes robust and transparent 
request-local storage in a manner that works comfortably with the WSGI 
call stack, which like any call stack can be recursive and complex. 
Lacking such storage, stuffing objects in the environment is better than 
the alternatives.


Extension APIs really only make sense if they are true *server* 
features, not application features; otherwise, you are better off using 
a library rather than middleware per se.


What server features?  Servers are dull.

Often middleware is used to implement policy separate from the 
application.  Libraries require another kind of abstraction, and 
implementing policy in libraries is, IMHO, messier than the middleware 
alternative for many important use cases.  Also there exists no neutral 
ground for libraries in Python.  Maybe egg entry points, but they aren't 
all that neutral, and aren't all that applicable either.  zope.interface 
would like to be neutral ground, but of course is not.  So multiple 
implementations can at least possibly congeal around a WSGI request.


Also of course server is a vague term.  Request in, response out, 
that's the minimal abstraction for HTTP, and there is no server in 
there.  If we're talking about things that call WSGI applications, 
well I have a ton of those that never use sockets and you'd be hard 
pressed to classify them as servers.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] help with the implementation of a WSGI middleware

2008-07-07 Thread Ian Bicking

Phillip J. Eby wrote:

I don't object to stuffing things in the environment; I object to:

1. Putting APIs in there (the API should be regular functions or 
objects, thanks)
2. Wrapping middleware around an app to put in APIs that it's going to 
have to know about anyway.


Well, sometimes this occurs because you want the middleware at a 
different level.  E.g., something like the transaction handler in 
repoze.tm (http://svn.repoze.org/repoze.tm/trunk/) -- you expect it to 
be there, and for it to put an object with a certain API in the 
environment, and it implements an outer transaction boundary.  It's 
something you can put in fairly speculatively, so that some consumer can 
make use of it.  It's also a case where objects seemingly well outside 
the scope of the controller/web need access to some transaction manager, 
and that manager's most obvious scope is for the request, and so some 
common means to get the current transaction manager would be nice. 
Anyway, arguably a good example of both an API in the environment, and 
an API that would be nice if you could easily access without being bound 
to any particular framework's convention for how to get the current request.


Often middleware is used to implement policy separate from the 
application.


And that kind of middleware is therefore (one hopes) transparent to the 
application.


Often *some* implementation must be present.  E.g., if you check 
REMOTE_USER you implicitly expect *something* to set REMOTE_USER.


  Libraries require another kind of abstraction, and implementing 
policy in libraries is, IMHO, messier than the middleware alternative 
for many important use cases.  Also there exists no neutral ground for 
libraries in Python.  Maybe egg entry points, but they aren't all that 
neutral, and aren't all that applicable either.  zope.interface would 
like to be neutral ground, but of course is not.  So multiple 
implementations can at least possibly congeal around a WSGI request.


Standards for data in the environ may be a good idea.  But APIs in the 
environ are generally *not* a good idea.


Yes, generally I agree.

Also of course server is a vague term.  Request in, response out, 
that's the minimal abstraction for HTTP, and there is no server in 
there.  If we're talking about things that call WSGI applications,


Nope, I mean actual servers.


Well, as I was implying, anything that calls an app is in some sense a 
server.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-04 Thread Ian Bicking

Iwan Vosloo wrote:

Many web frameworks and ORM tools have the need to propagate data
depending on some or other context within which a request is dealt with.
Passing it all via parameters to every nook of your code is cumbersome.

A lot of the frameworks use a thread local context to solve this
problem. I'm assuming these are based on threading.local.  


(See, for example:
http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual )

Such usage assumes that one request is served per thread.

This is not necessarily the case.  (Twisted would perhaps be an example,
but I have not checked how the twisted people deal with the issue.)


The Spawning server 
(http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html) would 
indeed get things mixed up this way, as uses greenlets to make (at least 
some) blocking calls async.  So it would encounter this problem full-force.


To throw another wrench in things, with the Paste/WebError evalexception 
interactive exception handler, it restores this thread-local context so 
you can later execute expressions in the same context.


--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-10 Thread Ian Bicking
John Millikin wrote:
 On Wed, Apr 9, 2008 at 10:15 PM, Bob Ippolito [EMAIL PROTECTED] wrote:
  That sounds like a really bad idea, if there is an option to change
  the behavior it shouldn't live in module state.

 Would you rather have strictness controls as parameters? demjson
 currently has seventeen of those. Maybe we could have loads(bytes) and
 loads_broken(bytes, allow_trailing_comma, allow_all_whitespace,
 allow_comments, ...) functions, one for parsing JSON, the other for
 parsing garbage.  There's no real way to hide or remove the complexity
 in parsing invalid data, so both warnings and parameters will cause
 the implementation to be much larger, but at least having to call
 warnings.filter (ignore, JSONWarning) might serve to make some users
 think twice.

What reason is there for all the different flags?  Why not just strict 
and loose?


-- 
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-09 Thread Ian Bicking
   parse (bytes_or_string)
   generate (obj, indent = None, ascii_only = True, encoding = 'utf-8')

I strongly prefer we stick to the conventional names of 
dump/dumps/load/loads, for consistency with other serialization 
libraries already in Python.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc

2008-04-08 Thread Ian Bicking
Alan Kennedy wrote:
 [1] But it's a shame they didn't write it on WSGI: then their services
 could have run on the Google compute cloud ;-)

Indeed.  After seeing a BaseHTTPServer JSON-RPC server go up on the 
Python Cookbook I wrote a WSGI server and made it into a tutorial: 
http://pythonpaste.org/webob/jsonrpc-example.html (but it's not a 
maintained library -- at least I won't be maintaining it).

 [2] Perhaps some pythonista from Web-SIG is most appropriate to advise
 how JSON-RPC should move forward? After all, we're more accustomed to
 server-side stuff than those javascript folks ;-)

Let it die?  It is more complicated than necessary, when instead you 
could just make each function a URL of its own, and POST the arguments 
and get back the response, with 500 Server Error for errors.  It's hard 
to spec that up because it's too simple.

OHM (http://pythonpaste.org/ohm/) follows this model of exposing a service.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Proposed specification: developer authentication

2008-03-31 Thread Ian Bicking
I'm having some technical problems with wsgi.org, but once those are 
figured out this will be posted to 
http://wsgi.org/wsgi/Specifications/developer_auth

I'll let the spec speak for itself, I guess.  I already have a half 
dozen tools that could make use of this spec, and I know several other 
tools that could also use it (TG toolbox, repoze.profile).

Comments encouraged.  Or just agreement.  Silence indicates lack of 
interest.


:Title: Developer Auth
:Author: Ian Bicking [EMAIL PROTECTED]
:Discussions-To: Python Web-SIG web-sig@python.org
:Status: Proposed
:Created: 31-Mar-2008

.. contents::

Abstract


Many tools can be written for a WSGI stack which should only accessible 
to developers.  For example, an interactive debugger in response to 
sessions.  Or a template system might display the underlying filenames 
that created a page.  Or profiling data.  In some cases there are 
security implications to exposing this data, in other cases it is 
harmless but undesirable to show this information to normal users.  This 
specification offers a single, simple way to detect if a user should be 
presented with this information.

Rationale
-

So far these tools have been controlled by configuration, e.g., ``debug 
= True``.  This works but can be dangerous, as a deployer or developer 
can forget to turn off tools.  Or, if it is controlled through Python 
code, it can be difficult to enable on a site that wasn't intended to 
have the tool on, e.g., if you want to debug a live site because you 
can't reproduce a problem in development.  Also, configuration doesn't 
allow some people to see these development tools while hiding them from 
other people.  A per-request and secure authentication method is more 
desirable.

This could be implemented using application-specific authentication 
methods and permission levels.  This is undesirable because often 
debugging is orthogonal to users -- you may want to debug a problem only 
present when a low-permission or anonymous user is visiting the site. 
Also it is difficult to keep application and debugging permissions 
coherent, which is probably why this technique is not used by any tools.

Specification
-

Debugging tools should look for a key 
``environ['x-wsgiorg.developer_user']``.  This will contain some kind of 
user name.  If it is empty or not present, then debugging tools should 
not activate themselves, or should not expose any information in the 
browser.

The user name can be used in logging, but all users are considered to 
have the same permission level (total access).  The username must be a 
``str``, but its contents are not constrained (an IP address, for 
example, would be acceptable, or a name and email, with an embedded space).

If a URL is protected except for developers, applications should simply 
return ``403 Forbidden``.  Seamless login is not part of this 
specification or its goals.  Some systems may be IP-controlled, for 
example, and no login is possible.


Example


This is a simple exception catcher that uses the key::

 import sys, traceback

 class CatchExceptions(object):
 def __init__(self, app):
 self.app = app
 def __call__(self, environ, start_response):
 if not environ.get('x-wsgiorg.developer_user'):
 return self.app(environ, start_response)
 try:
 return self.app(environ, start_response)
 except:
 start_response('500 Server Error', [('content-type', 
'text/plain')],
sys.exc_info())
 return [traceback.format_exc()]

Here is a IP-restricted middleware that sets the key::

 class IPDeveloper(object):
 def __init__(self, app, ips=('127.0.0.1',)):
 self.app = app
 self.ips = ips
 def __call__(self, environ, start_response):
 if environ.get('REMOTE_ADDR') in self.ips:
 environ['x-wsgiorg.developer_user'] = 
environ['REMOTE_ADDR']
 return self.app(environ, start_response)

Problems


* With security by obscurity in mind, it might be best if login methods 
weren't clear.  With ease of use in mind, easy logins are best.
* There's no levels of access.  Everyone is assumed to have complete 
access.  (You could add another custom key if you want to share extra 
information between the authentication and application layer.)
* This encourages people to do production deployments with debugging 
tools enabled.

Other Possibilities
---

* Configuration
* Conditional middleware composition
* Application login systems
* Some other generalized authentication system (AuthKit, etc).

Open Issues
---

* Should ``401 Authorization Required`` be returned?  Potentially with 
``WWW-Authenticate: x-wsgiorg.developer_user``.  This would signal to 
the middleware that a login should occur, which it may or may not ignore 
(it could

Re: [Web-SIG] Clarifications on Python 3.0 and WSGI.

2008-03-25 Thread Ian Bicking
Phillip J. Eby wrote:
 At 11:04 AM 3/25/2008 -0500, Ian Bicking wrote:
 Phillip J. Eby wrote:
 It says that in versions of Python where 'str is unicode' (i.e. 
 Jython, IronPython, and Python 3000), then the specification should 
 be read to define string as a unicode string whose characters can 
 be expressed in latin-1.
 Really, adding support for bytes is the stretch here.  In fact, I'd 
 almost go so far as to say the heck with bytes support except for the 
 response body.  I could easily consider headers to be text, instead.

 Latin-1?  How is this supposed to work at all?
 
 Latin-1 is the encoding that can allow a unicode string to losslessly 
 encode arbitrary bytes.  And that's how these things are handled (or 
 should be handled, per the spec) in Jython and IronPython today.
 
 In any case I only said I'd *almost* go so far as to say headers are 
 text.  :)

Are you proposing that we use a Latin-1 encoded string to hold bytes?

Isn't that kind of a step backwards in keeping unicode and text straight?

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Are you going to convert Pylons code into Python 3000?

2008-03-04 Thread Ian Bicking
Graham Dumpleton wrote:
 Personally I believe that WSGI 1.0 should die along with Python 2.X. I
 believe that WSGI 2.0 should be developed to replace it and the
 introduction of Python 3.0 would be a great time to do that given that
 people are going to have to change their code anyway and that code
 isn't then likely to be backward compatible with Python 2.X.

I don't believe it should just *die*.  But I agree that this is a good 
time to revisit the specification.  Especially since I have no idea how 
the change to unicode text would effect the WSGI environment.  Having 
the environment hold bytes seems weird, but having it hold unicode is a 
substantial change.

I don't think it will be as bad as Martijn thinks, because the libraries 
people use will probably have relatively few interface changes.  Pylons 
and WebOb for instance should maintain largely the same interface (and 
they already expose unicode when possible).  None of the changes 
proposed for WSGI 2 would change this.

If I'm maintaining two versions of a library (one for Python 2, one for 
Python 3), then at least I'd like to get a little benefit out of it, and 
a revised WSGI would give some benefit.

I think we might still need some kind of WSGI 1.1 to clarify what WSGI 1 
(-like semantics) means in a Python 3.0 environment.  Creating adapters 
from WSGI 1 to WSGI 2 should be easy enough that we could still offer 
some support for minimally-translated WSGI code.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse

2008-02-22 Thread Ian Bicking
Joe Gregorio wrote:
  * Thorough unit tests using unittest or doctest.
 
 Done.
 
 http://httplib2.googlecode.com/svn/trunk/httplib2test.py
 
 Many unit tests done in unittest. They fall into
 two categories, those that run locally and those
 that run against a set of URIs on the web. Is there
 a stdlib way of segregating those tests? All the
 code for the resources is also checked into
 subversion:
 
 http://httplib2.googlecode.com/svn/trunk/test/

I guess this is a test-related feature request: something that would be 
nice, and that I don't believe httplib2 specifically allows (though 
maybe I am unaware of it) is a clear/documented way to mock http calls. 
  wsgi_intercept provides this in a kind of general way, and includes 
some httplib2 support, but direct support in httplib2 (and the stdlib) 
would be very nice, and I think encourage people to do better testing.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse

2008-02-22 Thread Ian Bicking
Joe Gregorio wrote:
 On Fri, Feb 22, 2008 at 2:09 PM, Ian Bicking [EMAIL PROTECTED] wrote:
  I guess this is a test-related feature request: something that would be
  nice, and that I don't believe httplib2 specifically allows (though
  maybe I am unaware of it) is a clear/documented way to mock http calls.
   wsgi_intercept provides this in a kind of general way, and includes
  some httplib2 support, but direct support in httplib2 (and the stdlib)
  would be very nice, and I think encourage people to do better testing.
 
 I have a MockHttp in another project that I use for testing code
 that uses httplib2, is this what you'd like to see included in
 httplib2 itself?
 
   
 http://code.google.com/p/feedvalidator/source/browse/trunk/apptestsuite/client/atompubbase/tests/mockhttp.py

Yes, more or less.  Only taking from files on disk is less flexible than 
a WSGI application, so that more general interface would also be nice 
(though having both would be good too).

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse

2008-02-20 Thread Ian Bicking
Brett Cannon wrote:
 which I am liking. But I figured I would ask if there is any remote
 chance the this SIG has plans to either merge urllib and urllib2 or
 come up with a new module, or something before 3.0 comes out.

httplib2 is basically a replacement for urllib.  I personally prefer it 
to urllib.  I don't know how other people feel, or Joe's thoughts (the 
author).

Somewhat ironically httplib2 has a scope that is closer to urllib than 
httplib.  It would be nice if this naming style (x and x2) didn't persist.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Removal of Cookie in Python 3.0 OK?

2008-02-04 Thread Ian Bicking
Brett Cannon wrote:
 On Feb 3, 2008 3:41 PM, Ian Bicking [EMAIL PROTECTED] wrote:
 Brett Cannon wrote:
 As part of the standard library cleanup for Python 3.0, it has been
 suggested to me that the Cookie module be removed. The rationale for
 this is that most of the module is already deprecated and cookielib
 does a better job for cookie support anyway.

 I just wanted to see if anyone here had strong objections (along with
 reasons) as to why the module should be kept around in some form or
 another.
 I think most frameworks still use the Cookie module.  The cookielib
 module is more oriented to the client side.  It doesn't seem to have the
 same parsing functions that you'd use on the server side (though maybe
 they are there and just not documented because they also exist in the
 Cookie module).
 
 I honestly don't know. This was just something that someone proposed
 and I figured I would quickly look into, especially since I am trying
 to create a single http.cookies module. But if both modules stick
 around that might not work out very well having BaseCookie,
 SimpleCookie, and Cookie all in the same module but doing very
 different things.

I'd actually would prefer simple parsing functions instead of the 
objects of the Cookie module.  And the only thing I really like in the 
cookie module is BaseCookie; the other classes try to be clever and just 
manage to be distracting or annoying.

If as Jim suggests the existing Cookie module was made into an 
installable package we could have backward compatibility in addition to 
a cleaner stdlib going forward.  (Or we could leave cookies out of the 
stdlib, but this particular functionality doesn't bother me since it's 
fairly clear, at least now, how it should be implemented.)

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] reorg of web-related modules for Python 3K

2008-02-04 Thread Ian Bicking
Bill Janssen wrote:
 I think WSGI is a better interface than any of these.  BaseHTTPServer is 
 a reasonable basis for building a server (wsgiref.simple_server and 
 other's use it), but the subclasses are a little funky IMHO.  Giving 
 them the name http.server makes them seem like the Right Solution, and I 
 don't think they are.  They're more like server-building tools.
 
 Yes, these classes are quite old, and have been updated only patchily
 over the years.  I don't use them, either.  But I guess the question
 is whether wsgiref.* is a better _implementation_ than any of these.
 We don't really have interfaces in Python.

wsgiref.simple_server actually uses BaseHTTPServer, so the 
implementations are tied.  wsgiref.simple_server is a much better API 
than BaseHTTPServer.  Even then, wsgiref.simple_server isn't the only 
server based on BaseHTTPServer, so it's not without some use as an 
abstract base class for servers.  It's just not a useful base class for 
applications.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Removal of Cookie in Python 3.0 OK?

2008-02-03 Thread Ian Bicking
Brett Cannon wrote:
 As part of the standard library cleanup for Python 3.0, it has been
 suggested to me that the Cookie module be removed. The rationale for
 this is that most of the module is already deprecated and cookielib
 does a better job for cookie support anyway.
 
 I just wanted to see if anyone here had strong objections (along with
 reasons) as to why the module should be kept around in some form or
 another.

I think most frameworks still use the Cookie module.  The cookielib 
module is more oriented to the client side.  It doesn't seem to have the 
same parsing functions that you'd use on the server side (though maybe 
they are there and just not documented because they also exist in the 
Cookie module).

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] reorg of web-related modules for Python 3K

2008-02-03 Thread Ian Bicking
Bill Janssen wrote:
 Over on the stdlib-sig, Brett's proposing that we move some of the
 HTTP-related classes:
 
 OK, to keep this ball rolling, here is my suggestion for reorganizing
 HTTP modules:

   httplib - http.tools
   BaseHTTPServer - http.server
   SimpleHTTPServer - http.server
   CGIHTTPServer - http.server

I think WSGI is a better interface than any of these.  BaseHTTPServer is 
a reasonable basis for building a server (wsgiref.simple_server and 
other's use it), but the subclasses are a little funky IMHO.  Giving 
them the name http.server makes them seem like the Right Solution, and I 
don't think they are.  They're more like server-building tools.

   cookielib - http.cookies

 Since the various HTTP server modules have no name clashes we
 can consolidate them into a single module.
 
 Seems reasonable to me, but I thought it should be looked at in this
 forum.  All this is going into PEP 3108, so either join the stdlib-sig,
 or read the PEP, if you care about all this.
 
 Alexandre Vassalotti further proposes the following:
 
 xmlrpclib - xmlrpc.tools
 SimpleXMLRPCServer - xmlrpc.server
 DocXMLRPCServer - xmlrpc.server

Similarly here I think there are better ways to arrange servers than 
these subclasses -- both more reusable and simpler.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-28 Thread Ian Bicking
Manlio Perillo wrote:
 Ian Bicking ha scritto:
  
 [...]

 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
 list.

 This means that the request uri recostruction must be changed:
 SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

 I suppose you could leave stuff on PATH_INFO.  But that doesn't seem 
 to fit with the idea of PATH_INFO.  Also, will it be strictly 
 SCRIPT_NAME/consumed_path/PATH_INFO, or could it be 
 SCRIPT_NAME/consumed_path/some_other_parsing/consumed_path/PATH_INFO 
 -- after all, there's cases where stuff gets pushed from PATH_INFO to 
 SCRIPT_NAME, and if consumed_path is in between, which one do you push 
 stuff to?

 
 What do you intend by some_other_parsing?

I have code that takes stuff from PATH_INFO and puts it on SCRIPT_NAME 
without updating routing_args.  It could update routing_args... but I 
guess the question still remains: if there's multiple places where this 
kind of transformation is done, which one does SCRIPT_NAME point to?

   Ian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


  1   2   3   >