Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-07 Thread Malthe Borch

On 12/4/09 12:50 AM, And Clover wrote:

So for now there is basically nothing useful WSGI can do other than
provide direct, byte-oriented (even if wrapped in 8859-1 unicode
strings) access to headers.


You could argue that this is perhaps a good reason to replace 
``environ`` with something that interprets the headers according to how 
HTTP is actually used in the real world.


It may be that WSGI should use bytes everywhere and the recommended 
usage would be via a decorator (which could cache computations on the 
environ dictionary):


e.g. the raw application handler versus one decorated with an imaginary 
``webob`` function.


  def app(environ, start_response):
  ...

  @webob
  def app(request):
  ...

It is often said that WSGI should be practical, but in actual usage, I 
think most developers use a request/response abstraction layer.


Middlewares are usually shrink-wrapped library code that could handle a 
bytes-based environ dict (they'd have to explicitly decode the headers 
of interest).


\malthe

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Manlio Perillo
Henry Precheur ha scritto:
> On Fri, Dec 04, 2009 at 07:40:55PM +0100, Manlio Perillo wrote:
>> What are the functions that does not works with byte strings?
> 
> Just to make things clear, I was talking about Python 3.
> 

I know.

Unfortunately I don't have installed Python 3, I'm just reading the code.

> All the functions I tried not ending with _from_bytes raise an exception
> with bytes. This includes urllib.parse.parse_qs & urllib.parse.urlparse
> which are rather critical ...
> 

Ah, ok.
Can you show me the traceback of parse_qs? Thanks.


>> First of all, HTTP never says that whole headers are of type TEXT.
>> Only specific components are of type TEXT.
> 
> If parts of a header contain latin-1 characters, that means its
> encoding is latin-1 (at least partially).
> 

This is not completely true.

> [...]

> And WSGI is not about HTTP in a distant future, it's about HTTP right
> now.
> 
>> Do you really want to define the new WSGI specification to be "against"
>> the new (possible) HTTP spec?
> 
> I don't know why it would be "against" it.

Well, I have quoted it for this reason.
What I mean is that, IMHO:

- Using Unicode strings in WSGI is an abuse of Unicode string
- This abuse is not justified by the HTTP spec


> [...]


Regards  Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Henry Precheur
On Fri, Dec 04, 2009 at 07:40:55PM +0100, Manlio Perillo wrote:
> What are the functions that does not works with byte strings?

Just to make things clear, I was talking about Python 3.

All the functions I tried not ending with _from_bytes raise an exception
with bytes. This includes urllib.parse.parse_qs & urllib.parse.urlparse
which are rather critical ...

> First of all, HTTP never says that whole headers are of type TEXT.
> Only specific components are of type TEXT.

If parts of a header contain latin-1 characters, that means its
encoding is latin-1 (at least partially).

> Moreover, HTTPbis has finally clarified this; TEXT is no more used,
> instead non ascii characters are to be considered opaque.

Yes, but the HTTPbis draft also says:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 character encoding.

And WSGI is not about HTTP in a distant future, it's about HTTP right
now.

> Do you really want to define the new WSGI specification to be "against"
> the new (possible) HTTP spec?

I don't know why it would be "against" it. WSGI aims to handle HTTP in
the real world. Just because the HTTPbis spec is released wont take all
the garbage out of the web. There will still be latin-1 strings in
headers passed around for the next 10 years.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Manlio Perillo
Henry Precheur ha scritto:
> On Fri, Dec 04, 2009 at 10:17:09AM +0100, Manlio Perillo wrote:
>> It is just as simple as using byte strings, IMHO.
> 
> No, it's not. There were lots of dicussions regarding this on the
> mailing list. One of the main issue is that the standard library
> supports bytes poorly. urllib for example expects strings not bytes.
> 

I read last month discussions 3 day ago!
The quote function supports byte strings, as an example.

What are the functions that does not works with byte strings?

>>> * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
>>>   says. WSGI is about HTTP, but that doesn't necessarily includes all
>>>   other standards extending HTTP.
>>>
>> HTTP never says to consided whole headers as latin-1 text, IMHO.
> 
> It does:
> 
>   When no explicit charset parameter is provided by the sender, media
>   subtypes of the "text" type are defined to have a default charset value
>   of "ISO-8859-1" when received via HTTP.
> 
>   http://tools.ietf.org/html/rfc2616#section-3.7.1
> 

This is not correct.

First of all, HTTP never says that whole headers are of type TEXT.
Only specific components are of type TEXT.

Moreover, HTTPbis has finally clarified this; TEXT is no more used,
instead non ascii characters are to be considered opaque.

Do you really want to define the new WSGI specification to be "against"
the new (possible) HTTP spec?

Of course it will work; but since some code in the standard library
needs to be fixed (the wsgiref.util.application_uri, as an example),
maybe it is better to fix it to work with byte strings.

Just my two cents.

> [...]


Regards  Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Henry Precheur
On Fri, Dec 04, 2009 at 10:17:09AM +0100, Manlio Perillo wrote:
> It is just as simple as using byte strings, IMHO.

No, it's not. There were lots of dicussions regarding this on the
mailing list. One of the main issue is that the standard library
supports bytes poorly. urllib for example expects strings not bytes.

> > * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
> >   says. WSGI is about HTTP, but that doesn't necessarily includes all
> >   other standards extending HTTP.
> > 
> 
> HTTP never says to consided whole headers as latin-1 text, IMHO.

It does:

  When no explicit charset parameter is provided by the sender, media
  subtypes of the "text" type are defined to have a default charset value
  of "ISO-8859-1" when received via HTTP.

  http://tools.ietf.org/html/rfc2616#section-3.7.1

> Yes, but it is quite stupid to first convert to Unicode and then convert
> again to byte string.

99% of the time latin-1 will work. And converting from Unicode to bytes
is not costly.

6 months ago I was a big fan of bytes, but bytes create more problems
than they solve.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Manlio Perillo
And Clover ha scritto:
> Manlio Perillo wrote:
> 
>> Words of *TEXT MAY contain characters from character sets other than
>> ISO-8859-1 [22] only when encoded according to the rules of RFC 2047
> 
> Yeah, this is, unfortunately, a lie. The rules of RFC 2047 apply only to
> RFC*822-family 'atoms' and not elsewhere; indeed, RFC2047 itself
> specifically denies that an encoded-word can go in a quoted-string.
> 
> RFC2047 encoded-words are not on-topic in an HTTP header(*); this has
> been confirmed by newer development work on HTTPbis by Reschke et al.
> (http://tools.ietf.org/wg/httpbis/).
> 

Thanks.
HTTPbis seems to fix all these problems:

"Historically, HTTP has allowed field content with text in the ISO-
8859-1 [ISO-8859-1] character encoding and supported other character
sets only through use of [RFC2047] encoding.  In practice, most HTTP
header field values use only a subset of the US-ASCII character
encoding [USASCII].  Newly defined header fields SHOULD limit their
field values to US-ASCII characters.  Recipients SHOULD treat other
(obs-text) octets in field content as opaque data."


This is the new rule for `quoted-string`:

quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
   ; OWS /  / obs-text
obs-text   = %x80-FF

quoted-pair= "\" ( WSP / VCHAR / obs-text )


> The "correct" way of escaping header parameters in an RFC*822-family
> protocol would be RFC2231's complex encoding scheme, but HTTP is
> explicitly not an 822-family protocol despite sharing many of the same
> constructs. See
> http://tools.ietf.org/html/draft-reschke-rfc2231-in-http-06 for a
> strategy for how 2231 should interact with HTTP, but note that for now
> RFC2231-in-HTTP simply does not exist in any deployed tools.
> 

It seems reasonable.

> So for now there is basically nothing useful WSGI can do other than
> provide direct, byte-oriented (even if wrapped in 8859-1 unicode
> strings) access to headers.
> 

Yes, this is what I think.
I have some doubts about wrapping the headers in 8859-1 unicode strings,
but luckily there is surrogateescape.



Regards  Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-04 Thread Manlio Perillo
Henry Precheur ha scritto:
> On Thu, Dec 03, 2009 at 09:15:06PM +0100, Manlio Perillo wrote:
>> There is something that I don't understand.
>>
>> Some HTTP headers, like Accept-Language, contains data described as
>> `token`, where:
>>
>> token  = 1*
>>
>> So a token, IMHO, is an opaque string, and it SHOULD not decoded.
>> In Python 3.x it SHOULD be a byte string.
> 
> I think this is more an issue that frameworks should deal with. By
> decoding every headers value to latin-1:
> 
> * It keeps WSGI simple. Simple is good.
> 

It is just as simple as using byte strings, IMHO.
It is not simple, it is convenient because of (if I understand
correctly) how code is converted by 2to3.

> * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
>   says. WSGI is about HTTP, but that doesn't necessarily includes all
>   other standards extending HTTP.
> 

HTTP never says to consided whole headers as latin-1 text, IMHO.

> * It's possible to convert latin-1 strings to bytes without losing data.
> 

Yes, but it is quite stupid to first convert to Unicode and then convert
again to byte string.

It it true, however, that this does not happen often; but only for:

- WSGI applications that implement an HTTP proxy
- WSGI applications that needs to support HTTP Digest Authentication
- WSGI applications that store encoded data in cookies


Regards  Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread And Clover

Manlio Perillo wrote:


Words of *TEXT MAY contain characters from character sets other than
ISO-8859-1 [22] only when encoded according to the rules of RFC 2047


Yeah, this is, unfortunately, a lie. The rules of RFC 2047 apply only to 
RFC*822-family 'atoms' and not elsewhere; indeed, RFC2047 itself 
specifically denies that an encoded-word can go in a quoted-string.


RFC2047 encoded-words are not on-topic in an HTTP header(*); this has 
been confirmed by newer development work on HTTPbis by Reschke et al. 
(http://tools.ietf.org/wg/httpbis/).


The "correct" way of escaping header parameters in an RFC*822-family 
protocol would be RFC2231's complex encoding scheme, but HTTP is 
explicitly not an 822-family protocol despite sharing many of the same 
constructs. See 
http://tools.ietf.org/html/draft-reschke-rfc2231-in-http-06 for a 
strategy for how 2231 should interact with HTTP, but note that for now 
RFC2231-in-HTTP simply does not exist in any deployed tools.


So for now there is basically nothing useful WSGI can do other than 
provide direct, byte-oriented (even if wrapped in 8859-1 unicode 
strings) access to headers.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 09:15:06PM +0100, Manlio Perillo wrote:
> There is something that I don't understand.
> 
> Some HTTP headers, like Accept-Language, contains data described as
> `token`, where:
> 
> token  = 1*
> 
> So a token, IMHO, is an opaque string, and it SHOULD not decoded.
> In Python 3.x it SHOULD be a byte string.

I think this is more an issue that frameworks should deal with. By
decoding every headers value to latin-1:

* It keeps WSGI simple. Simple is good.

* WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
  says. WSGI is about HTTP, but that doesn't necessarily includes all
  other standards extending HTTP.

* It's possible to convert latin-1 strings to bytes without losing data.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Manlio Perillo
And Clover ha scritto:
> Manlio Perillo wrote:
> 
>> However what about URI (that is, for PATH_INFO and the like)?
>> For URI (if I remember correctly) the suggested encoding is UTF-8, so
>> URLS should be decoded using
> 
>>   url.decode('utf-8', 'surrogateescape')
> 
>> Is this correct?
> 
> The currently-discussed proposal is ISO-8859-1, allowing the real bytes
> to be trivially extracted. This is consistent with the other headers and
> would be my preferred approach.
> 

There is something that I don't understand.

Some HTTP headers, like Accept-Language, contains data described as
`token`, where:

token  = 1*

So a token, IMHO, is an opaque string, and it SHOULD not decoded.
In Python 3.x it SHOULD be a byte string.

Text content is described as `TEXT`, where:

The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047
[14].

TEXT   = 


The only type of data where TEXT can be used is `quoted-string`.

A `quoted-string` only appears in well specified portions of an header.
So, IMHO, it is *not* correct for a WSGI middleware, to return all HTTP
headers as Unicode strings.

This is up to the application/framework, that must parse each header,
split it in component and handle them as more appropriate (as byte
string, Unicode string or instance of some other data type).


> [...]


Regards   Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 07:35:14PM +0100, And Clover wrote:
> >I don't know what the HTTP/Cookie spec says about this.
> 
> The traditional interpretation of RFC2616 is that headers are ISO-8859-1.
> 
> You will notice that no browser correctly follows this.

The RFC 2109 & 2965 say that a cookie's value can be anything:

> The VALUE is opaque to the user agent and may be anything the origin
> server chooses to send, possibly in a server-selected printable ASCII
> encoding.

Theoricaly you could put something like: 'foo\n\0bar' in a cookie.

Also a cookie can include comments which have to be encoded using ...
UTF-8:

> Comment=value
>   OPTIONAL.  Because cookies can be used to derive or store
>   private information about a user, the value of the Comment
>   attribute allows an origin server to document how it intends to
>   use the cookie.  The user can inspect the information to decide
>   whether to initiate or continue a session with this cookie.
>   Characters in value MUST be in UTF-8 encoding.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread James Y Knight
On Dec 3, 2009, at 1:35 PM, And Clover wrote:
> Manlio Perillo wrote:
> 
>> However what about URI (that is, for PATH_INFO and the like)?
>> For URI (if I remember correctly) the suggested encoding is UTF-8, so
>> URLS should be decoded using
> 
>>  url.decode('utf-8', 'surrogateescape')
> 
>> Is this correct?
> 
> The currently-discussed proposal is ISO-8859-1, allowing the real bytes to be 
> trivially extracted. This is consistent with the other headers and would be 
> my preferred approach.

Right, for WSGI 1.1 on Python 3.x, 8859-1 strings is the plan. Other, more 
ideologically pure options can be discussed for an incompatible revision of 
WSGI (e.g. the hypothetical 2.0).

BTW: I hope to have a first draft of the changes by Monday. (But don't beat up 
on me if it's delayed; I am working on it.)

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Manlio Perillo
And Clover ha scritto:
> [...]
>> Cookie data SHOULD be transparent to the server/gateway; however WSGI is
>> going to assume that data is encoded in latin-1.
> 
> Yeah. This is no big deal because non-ASCII characters in cookies are
> already broken everywhere(*). Given this and other limitations on what
> characters can go in cookies, they are habitually encoded using ad-hoc
> mechanisms handled by the application (typically a round of URL-encoding).
> 
> *: in particular:
> 
> - Opera and Chrome send non-ASCII cookie characters in UTF-8.
> - IE encodes using the system codepage (which can never be UTF-8),
>   mangling any characters that don't fit in the codepage through the
>   traditional Windows 'similar replacement character' scheme.
> - Mozilla uses the low byte of each UTF-16 code point (so ISO-8859-1
>   gets through but everything else is mangled)
> - Safari refuses to send any cookie containing non-ASCII characters.
> 

Thanks for this summary.
I think it should go in a wiki or in a separate document (like
rationale) to the WSGI spec.

However this should never happen with cookie, since cookie data is
opaque to browser, and it MUST send it "as is".

What you describe happen with other headers containing TEXT.
And now I understand that strange behaviour of Firefox with non latin-1
strings in username, in HTTP Basic Authentication.

> [...]

Regards   Manlio
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread And Clover

Manlio Perillo wrote:


However what about URI (that is, for PATH_INFO and the like)?
For URI (if I remember correctly) the suggested encoding is UTF-8, so
URLS should be decoded using



  url.decode('utf-8', 'surrogateescape')



Is this correct?


The currently-discussed proposal is ISO-8859-1, allowing the real bytes 
to be trivially extracted. This is consistent with the other headers and 
would be my preferred approach.


Python 3.1's wsgiref.simple_server, on the other hand, blindly uses 
urllib.unquote, which defaults to UTF-8 without surrogateescape, 
mangling any non-UTF-8 input.


I don't really care whether UTF-8+surrogateescape or ISO-8859-1 encoding 
is blessed. But *something* needs to be blessed. An encoding, an 
alternative undecoded path_info, both, something else... just *something*.



Let's consider the `wsgiref.util.application_uri` function
There is a potential problem, here, with the quote function.


Yes. wsgiref is broken in Python 3.1. Not quite as broken as it was in 
3.0, but still broken. Until we can come to a Pronouncement on what WSGI 
*is* in Python 3, it is meaningless anyway.



Cookie data SHOULD be transparent to the server/gateway; however WSGI is
going to assume that data is encoded in latin-1.


Yeah. This is no big deal because non-ASCII characters in cookies are 
already broken everywhere(*). Given this and other limitations on what 
characters can go in cookies, they are habitually encoded using ad-hoc 
mechanisms handled by the application (typically a round of URL-encoding).


*: in particular:

- Opera and Chrome send non-ASCII cookie characters in UTF-8.
- IE encodes using the system codepage (which can never be UTF-8),
  mangling any characters that don't fit in the codepage through the
  traditional Windows 'similar replacement character' scheme.
- Mozilla uses the low byte of each UTF-16 code point (so ISO-8859-1
  gets through but everything else is mangled)
- Safari refuses to send any cookie containing non-ASCII characters.


I don't know what the HTTP/Cookie spec says about this.


The traditional interpretation of RFC2616 is that headers are ISO-8859-1.

You will notice that no browser correctly follows this.

...sigh.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Manlio Perillo
James Y Knight ha scritto:
> I move to bless mod_wsgi's definition of WSGI 1.1 [1]
> [...]
> 
> [1] http://code.google.com/p/modwsgi/wiki/SupportForPython3X

Hi.

Just a few questions.

It is true that HTTP headers can be encoded assuming latin-1; and they
can be encoded using PEP 383.

However what about URI (that is, for PATH_INFO and the like)?
For URI (if I remember correctly) the suggested encoding is UTF-8, so
URLS should be decoded using

  url.decode('utf-8', 'surrogateescape')

Is this correct?


Now another question.
Let's consider the `wsgiref.util.application_uri` function

def application_uri(environ):
url = environ['wsgi.url_scheme']+'://'
from urllib.parse import quote

if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']

if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']

url += quote(environ.get('SCRIPT_NAME') or '/')
return url


There is a potential problem, here, with the quote function.
This function does the following:

def quote(string, safe='/', encoding=None, errors=None):
if isinstance(string, str):
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'strict'
string = string.encode(encoding, errors)

This means that if we use surrogateescape, the informations about
original bytes is lost here.

This can be easily fixed by changing the application_uri function, but
this also means that a WSGI application will not work with Python 3.1.x.


Finally, a question about cookies.
Cookie data SHOULD be transparent to the server/gateway; however WSGI is
going to assume that data is encoded in latin-1.

I don't know what the HTTP/Cookie spec says about this.
However, from a WSGI application point of view, the cookie data can, as
an example, contain some text encoded in UTF-8; this means that the
application must first encode the data:

  cookie_bytes = cookie.encode('latin-1', 'surrogateescape')

and then decode it using UTF-8:

  my_cookie_data = cookie_bytes.decode('utf-8')


This is a bit unreasonable, but I don't know if this is a common
practice (I do this, just to make an example).



Manlio Perillo
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-30 Thread James Y Knight
On Nov 29, 2009, at 12:40 AM, James Y Knight wrote:
> The next step here is clearly for someone to redraft the changes as a diff 
> against PEP 333. If you do not have any interest in being that person, please 
> make that clear, so someone else can step up to do so.

Okay, not sensing any other volunteers here...I guess it's all me.

The intention of this spec update is to be compatible with existing 
middleware/applications when running on Python 2.X. Apps/middleware running on 
python 3.X require changes in any case, and this specification will tell them 
exactly what to expect. That Python 3.X middleware and WSGI adapters will have 
to deal with both bytestrings and unicode strings in many parts of the API 
(output status code, output headers, output response iterable/write callback) 
will add some complexity, but that's life.

Any WSGI implementations on Python 3.X claiming compliance to WSGI 1.0 are most 
likely broken, and its behavior cannot be relied upon. Too bad about wsgiref.

As self-appointed author, I am going to take a stand and say that both the 
python3-related string-type specifications, and the additional requirements 
except #3 (read() with no-args) and #4 (file_wrapper looking at 
Content-Length), will be included.

And it will be called WSGI 1.1.

Back to the list of "extra requirements":

#1: (readline with an arg) must be included, despite the potential for 
breakage. That ship has already sailed, the breakage has already occurred, it's 
already required. Disagreement here really is of no consequence.

#2: (wsgi.input() must return EOF at EOF): I do not believe will break any 
middleware. It will require some changes in some WSGI adapter implementations, 
but that's acceptable. If you have a real-life example of middleware that would 
break here, show it. So this will be included.

#3 is not actually required for anything; at best it's an extra convenience; 
repeatedly reading until EOF will work just as well. Furthermore, the API 
change has the potential to break some middleware in Python 2.X, so I'll take 
the safe road and not make the change.

The purpose behind #4 is essentially included in #6, and so is not needed as a 
separate requirement.

#5 and #6 are uncontroversial and of no impact to an already-correct 
implementation. They will be included.

I'll send a diff of the actual wording changes once I've written it.

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-30 Thread And Clover

Graham Dumpleton wrote:


Answering my own question, it is actually obvious that it has to be
called (1, 0). This is because wsgiref in Python 3.X already calls it
(1, 0) and don't have much choice to be in agreement with that.


wsgiref.simple_server in Python 3 to date is not something that anyone 
should worry about being compatible with. It is a 2to3 hack that cannot 
meaningfully claim to represent wsgi version anything.


Careless use of urllib.parse.unquote causes 3.0's simple_server not to 
work at all, and 3.1's to mangle the path by treating it as UTF-8 
instead of ISO-8859-1, as 'WSGI 1.1' proposed and mod_wsgi (and even 
mod_cgi via wsgiref.CGIHandler) delivered.


Yes, I'm always going on about Unicode paths. I'm fed up of shipping 
apps with a page-long deployment note about fixing them. It pains me 
that in so many years both this and "What do we do about Python 3?" 
still haven't been addressed.


mod_wsgi 3.0 already has more traction than wsgiref 3.1 and I would 
prefer not to see more farcical reverse-progress at this point.


For what it's worth my responses on the issues of this thread. But at 
this point I really just want a BDFL to just come and do it, whatever it 
is. A new WSGI, whatever the version number, is massively overdue.


>> 1. The 'readline()' function of 'wsgi.input' may optionally take a 
size hint.


Yes. Obviously. Bad practice but unavoidable now. Should have been a 1.0 
amendment a long time ago.


>> 2. The 'wsgi.input' must provide an empty string as end of input 
stream marker.
>> 3. The size argument to 'read()' function of 'wsgi.input' would be 
optional and if not supplied the function would return all available 
request content.
>> 4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must honour 
the Content-Length response header and must only return from the file 
that amount of content.


+0. Seems reasonable but don't massively care. Presumably an application 
must refuse to run on 1.0 if it requires these behaviours?


>> 5. Any WSGI application or middleware should not return more data 
than specified by the Content-Length response header if defined.
>> 6. The WSGI adapter must not pass on to the server any data above 
what the Content-Length response header defines if supplied.


Yes.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-28 Thread Graham Dumpleton
2009/11/29 James Y Knight :
> On Nov 28, 2009, at 10:44 PM, Graham Dumpleton wrote:
>> Either way, since there seemed to be objections at some level on every
>> point, and since I really really have no enthusiasm for this stuff any
>> more or of fighting for any change, I retract my personal interest in
>> having any of the amendments as part of a WSGI 1.1 specification and
>> will remove all that detail from mod_wsgi documentation
>
>
> [...]
>
>> If don't see an answer, then guess I will just have to revert it back
>> to (1, 0) to be safe and to avoid any accusations that am highjacking
>> the process.
>>
>> An answer sooner rather than later would be appreciated on the
>> wsgi.version issue.
>
> I'd rather appreciate it if you held off on making such changes until either 
> this discussion either peters out or is resolved. You sound somewhat 
> negative, but it seems to me that there's actually quite close to being a 
> consensus on adopting most of your proposal. Changing the proposal out from 
> under us doesn't really help things.
>
> The next step here is clearly for someone to redraft the changes as a diff 
> against PEP 333. If you do not have any interest in being that person, please 
> make that clear, so someone else can step up to do so.

No I do not want a part in drafting any changes, I just want to move
on from all this stuff and starting working on other projects. Since
though some don't seem to understand the reasons for the changes then
you will find it hard to find some who is in a position to be able to
do them.

You probably really are just better off worrying about Python 3.X
support and accept that tinkering at edges of WSGI 1.0 on other issues
is not going to solve all the WSGI issues. As PJE suggest, leave that
to an interface incompatible update so that you don't have this whole
problem of what version existing components support.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-28 Thread James Y Knight
On Nov 28, 2009, at 10:44 PM, Graham Dumpleton wrote:
> Either way, since there seemed to be objections at some level on every
> point, and since I really really have no enthusiasm for this stuff any
> more or of fighting for any change, I retract my personal interest in
> having any of the amendments as part of a WSGI 1.1 specification and
> will remove all that detail from mod_wsgi documentation


[...]

> If don't see an answer, then guess I will just have to revert it back
> to (1, 0) to be safe and to avoid any accusations that am highjacking
> the process.
> 
> An answer sooner rather than later would be appreciated on the
> wsgi.version issue.

I'd rather appreciate it if you held off on making such changes until either 
this discussion either peters out or is resolved. You sound somewhat negative, 
but it seems to me that there's actually quite close to being a consensus on 
adopting most of your proposal. Changing the proposal out from under us doesn't 
really help things.

The next step here is clearly for someone to redraft the changes as a diff 
against PEP 333. If you do not have any interest in being that person, please 
make that clear, so someone else can step up to do so.

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-28 Thread Graham Dumpleton
2009/11/29 Graham Dumpleton :
> After reading my prior blog posts where I explained my reasoning
> behind the changes, I will acknowledge that I haven't explained some
> stuff very well and people are failing to understand or getting wrong
> idea about why something is being suggested.
>
> I still believe there are though underlying problems there in the WSGI
> specification and right now, more by luck than design is various stuff
> working. In some cases such as readline(), the majority of WSGI
> applications/frameworks are in violation of the WSGI 1.0 specification
> due to their reliance on cgi.FieldStorage which makes calls to
> readline() with an argument.
>
> Either way, since there seemed to be objections at some level on every
> point, and since I really really have no enthusiasm for this stuff any
> more or of fighting for any change, I retract my personal interest in
> having any of the amendments as part of a WSGI 1.1 specification and
> will remove all that detail from mod_wsgi documentation. I will
> instead replace it with a separate page describing mod_wsgi compliance
> with WSGI 1.0 specification and highlighting those specific features
> which are in common, or not so common use, via mod_wsgi and which
> actually mean that people are writing applications incompatible with
> the WSGI 1.0 specification.
>
> To ensure compliance I could well raise Python exceptions for any use
> which isn't WSGI 1.0 compliant, but I have already learnt from where I
> tried get people to write portable WSGI applications by giving errors
> on certain use of stdin and stdout, that it is a pointless battle. All
> it got was a long list of users who believe mod_wsgi is broken even
> though if they read the actual documentation they would find it was
> their own software which was suspect or at least wasn't portable to
> all WSGI hosting mechanisms. This would only get worse if exceptions
> were raised for use of readline() with an argument and use of read()
> with no argument or argument of -1. Short story is that there are a
> fair few people who are just lazy, they will always write stuff the
> way the want to and not how it should be written. They will always
> blame other peoples code for being wrong before acknowledging they
> themselves are wrong.
>
> The only answer I therefore need out of WEB-SIG is whether the
> qualifications about how Python 3.X is to be supported are going to be
> an amendment to WSGI 1.0 or as a separate WSGI 1.1 update and whether
> if the latter whether the WSGI 1.1 tag will also have meaning for
> Python 2.X.
>
> I need an answer to this so I know whether to withdraw mod_wsgi 3.0
> from download and replace it with a mod_wsgi 4.0 which changes the
> wsgi.version tuple being passed, for both Python 2.X and Python 3.X,
> from (1, 1) back to original (1, 0), given that some opinion seems to
> be that any interface changes can only really be performed as part of
> WSGI 2.0 and so I would be wrong in using (1, 1).
>
> If don't see an answer, then guess I will just have to revert it back
> to (1, 0) to be safe and to avoid any accusations that am highjacking
> the process.
>
> An answer sooner rather than later would be appreciated on the
> wsgi.version issue.

Answering my own question, it is actually obvious that it has to be
called (1, 0). This is because wsgiref in Python 3.X already calls it
(1, 0) and don't have much choice to be in agreement with that.

I will therefore replace mod_wsgi 3.0 with a 4.0 release that reverts
it to (1, 0) from (1, 1) and all the other stuff about amendments can
be ignored.

Graham

> 2009/11/28 Graham Dumpleton :
>> Please ensure you have also all read:
>>
>> http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html
>>
>> I will post again later in detail when have some time to explain a few
>> more points not mentioned in that post and where people aren't quite
>> understanding the reasoning for doing things.
>>
>> One very quick comment about read().
>>
>> Allowing read() with no argument is no different to a user saying
>> read(environ['CONTENT_LENGTH']). Because a WSGI adapter/middleware is
>> going to have to track bytes read to ensure can return an empty string
>> as end sentinel, it will know length remaining and would internally
>> for read() with no argument do read(remaining_bytes). As such no real
>> differences in inefficiencies as far as memory use goes for
>> implementing read() because of need to implement end sentinel.
>>
>> Also, you have concerns about read() with no argument, but frankly
>> readline() with no argument, which is already required, is much worse
>> because you cant really track bytes read and just read to end of
>> input. This is because they only want to read to end of line and so
>> reading all input is going to blow out memory use unreasonably as you
>> speculate for read(). As such, a readline() implementation is likely
>> to read in blocks and internally buffer where read() doesn't
>> necessari

Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-28 Thread Graham Dumpleton
After reading my prior blog posts where I explained my reasoning
behind the changes, I will acknowledge that I haven't explained some
stuff very well and people are failing to understand or getting wrong
idea about why something is being suggested.

I still believe there are though underlying problems there in the WSGI
specification and right now, more by luck than design is various stuff
working. In some cases such as readline(), the majority of WSGI
applications/frameworks are in violation of the WSGI 1.0 specification
due to their reliance on cgi.FieldStorage which makes calls to
readline() with an argument.

Either way, since there seemed to be objections at some level on every
point, and since I really really have no enthusiasm for this stuff any
more or of fighting for any change, I retract my personal interest in
having any of the amendments as part of a WSGI 1.1 specification and
will remove all that detail from mod_wsgi documentation. I will
instead replace it with a separate page describing mod_wsgi compliance
with WSGI 1.0 specification and highlighting those specific features
which are in common, or not so common use, via mod_wsgi and which
actually mean that people are writing applications incompatible with
the WSGI 1.0 specification.

To ensure compliance I could well raise Python exceptions for any use
which isn't WSGI 1.0 compliant, but I have already learnt from where I
tried get people to write portable WSGI applications by giving errors
on certain use of stdin and stdout, that it is a pointless battle. All
it got was a long list of users who believe mod_wsgi is broken even
though if they read the actual documentation they would find it was
their own software which was suspect or at least wasn't portable to
all WSGI hosting mechanisms. This would only get worse if exceptions
were raised for use of readline() with an argument and use of read()
with no argument or argument of -1. Short story is that there are a
fair few people who are just lazy, they will always write stuff the
way the want to and not how it should be written. They will always
blame other peoples code for being wrong before acknowledging they
themselves are wrong.

The only answer I therefore need out of WEB-SIG is whether the
qualifications about how Python 3.X is to be supported are going to be
an amendment to WSGI 1.0 or as a separate WSGI 1.1 update and whether
if the latter whether the WSGI 1.1 tag will also have meaning for
Python 2.X.

I need an answer to this so I know whether to withdraw mod_wsgi 3.0
from download and replace it with a mod_wsgi 4.0 which changes the
wsgi.version tuple being passed, for both Python 2.X and Python 3.X,
from (1, 1) back to original (1, 0), given that some opinion seems to
be that any interface changes can only really be performed as part of
WSGI 2.0 and so I would be wrong in using (1, 1).

If don't see an answer, then guess I will just have to revert it back
to (1, 0) to be safe and to avoid any accusations that am highjacking
the process.

An answer sooner rather than later would be appreciated on the
wsgi.version issue.

Graham

2009/11/28 Graham Dumpleton :
> Please ensure you have also all read:
>
> http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html
>
> I will post again later in detail when have some time to explain a few
> more points not mentioned in that post and where people aren't quite
> understanding the reasoning for doing things.
>
> One very quick comment about read().
>
> Allowing read() with no argument is no different to a user saying
> read(environ['CONTENT_LENGTH']). Because a WSGI adapter/middleware is
> going to have to track bytes read to ensure can return an empty string
> as end sentinel, it will know length remaining and would internally
> for read() with no argument do read(remaining_bytes). As such no real
> differences in inefficiencies as far as memory use goes for
> implementing read() because of need to implement end sentinel.
>
> Also, you have concerns about read() with no argument, but frankly
> readline() with no argument, which is already required, is much worse
> because you cant really track bytes read and just read to end of
> input. This is because they only want to read to end of line and so
> reading all input is going to blow out memory use unreasonably as you
> speculate for read(). As such, a readline() implementation is likely
> to read in blocks and internally buffer where read() doesn't
> necessarily have to.
>
> It may also be pertinent to read:
>
> http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head-requests.html
>
> as from memory it talks about issues with not paying attention to
> Content-Length on output filtering middleware as well.
>
> As I said, will reply later when have some time to focus. Right now I
> have a 2 year old to keep amused.
>
> Graham
>
> 2009/11/27 James Y Knight :
>> I move to bless mod_wsgi's definition of WSGI 1.1 [1] as the official 
>> definition of WSGI 1.1, which descri

Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread Graham Dumpleton
Please ensure you have also all read:

http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html

I will post again later in detail when have some time to explain a few
more points not mentioned in that post and where people aren't quite
understanding the reasoning for doing things.

One very quick comment about read().

Allowing read() with no argument is no different to a user saying
read(environ['CONTENT_LENGTH']). Because a WSGI adapter/middleware is
going to have to track bytes read to ensure can return an empty string
as end sentinel, it will know length remaining and would internally
for read() with no argument do read(remaining_bytes). As such no real
differences in inefficiencies as far as memory use goes for
implementing read() because of need to implement end sentinel.

Also, you have concerns about read() with no argument, but frankly
readline() with no argument, which is already required, is much worse
because you cant really track bytes read and just read to end of
input. This is because they only want to read to end of line and so
reading all input is going to blow out memory use unreasonably as you
speculate for read(). As such, a readline() implementation is likely
to read in blocks and internally buffer where read() doesn't
necessarily have to.

It may also be pertinent to read:

http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head-requests.html

as from memory it talks about issues with not paying attention to
Content-Length on output filtering middleware as well.

As I said, will reply later when have some time to focus. Right now I
have a 2 year old to keep amused.

Graham

2009/11/27 James Y Knight :
> I move to bless mod_wsgi's definition of WSGI 1.1 [1] as the official 
> definition of WSGI 1.1, which describes how to implement WSGI adapters for 
> both Python 2.x and 3.x. It may not be perfect, but, it's been implemented 
> twice, and seems ot have no fatal flaws (it doesn't do any lossy transforms, 
> so any issues are irritations at worst). The basis for this definition is 
> also described in the "WSGI 1.0 Ammendments" [2] page.
>
> The definitions as they stand are clear enough to understand and implement, 
> but not currently in spec-worthy language. (e.g. it says "should" and "may" 
> in a colloquial fashion, but actually means MUST in some places and SHOULD in 
> others, as defined by RFC 2119)
>
> Thus, I'd like to suggest that Graham (if he's willing?) should reformat the 
> "Definition"/"Ammendments" as an actual diff against the current PEP 333. 
> Then, I will recommend adopting that document as an actual standard WSGI 1.1, 
> to replace PEP 333.
>
> This discussion has gone on long enough, and it doesn't really matter as much 
> to have the perfect API, as it does to have a standard.
>
> James
>
> [1] http://code.google.com/p/modwsgi/wiki/SupportForPython3X
> [2] http://www.wsgi.org/wsgi/Amendments_1.0
>
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread Ian Bicking
On Fri, Nov 27, 2009 at 12:20 PM, P.J. Eby  wrote:

>
>> > 1. The 'readline()' function of 'wsgi.input' may optionally take a size
>> hint.
>>
>> Already de-facto required. Leaving it out helps no-one. KEEP.
>>
>
> Fair enough, since it's a MAY.  On the other hand, because it's a MAY, it
> actually *helps* no-one, from a spec compatibility POV.  (That is, you have
> to test whether it's available, so it's no different than it not being in
> the spec to begin with.)
>
> So, putting it in doesn't *hurt*, but neither does it *help*...  so I lean
> towards leaving it to 2.x, where it can actually help.


I think it was meant to be a must.  The *caller* MAY pass in a size hint,
the implementor MUST implement this optional argument.  This is the de-facto
requirement.


>  > 2. The 'wsgi.input' must provide an empty string as end of input stream
>> marker.
>>
>> I don't think this will be a problem. What would WSGI middleware do to
>> break this requirement?
>>
>
> It could be reading the original input stream, and replacing it with
> another one.  Not very common I would guess, but it's still possible for a
> piece of perfectly valid 1.0 middleware to fail this requirement for 1.1,
> leading to the condition where you really can't tell if you're running valid
> 1.1 or not.


Middleware sometimes does this, but any time it does this it always replaces
the input stream with something truly file-like, e.g., StringIO or a temp
file.  Nothing but servers really hands sockets around, and sockets are the
only objects I'm aware of that don't act quite like a file.


 It was only put in in the first place so that CGI adapters could pass
>> through their input stream (which may not ever provide an EOF) without
>> having to wrap it. I agree that was a mistake, and should be corrected.
>>
>
> I agree...  but only in 2.x.
>
>
>
>  > 3. The size argument to 'read()' function of 'wsgi.input' would be
>> optional and if not supplied the function would return all available request
>> content. Thus would make 'wsgi.input' more file like as the WSGI
>> specification suggests it is, but isn't really per original definition.
>>
>> This one could be a problem with middleware, and that feature shouldn't
>> ever be used, in any case: reading into memory an arbitrary amount of data
>> from a client is not a good thing to encourage. OMIT.
>>
>
> Agreed -- even in 2.x it's questionable if not harmful.


Well, we need a way to handle content of unknown length, but if the file
terminates with '' then this isn't that important.

 > 4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must honour the
>> Content-Length response header and must only return from the file that
>> amount of content. This would guarantee that using wsgi.file_wrapper to
>> return part of a file for byte range requests would work.
>>
>> Given item #6, I suppose this is actually just a matter of efficiency, in
>> case the file wrapper is sent to a middleware rather than directly to the
>> wsgi gateway? If it goes directly to the gateway, that can of course stop
>> reading by itself. ?undecided?
>>
>
> I don't really see how this one helps anything in 1.x, and so lean towards
> leaving it out.


I don't really understand this either, unless it was handling range
responses as well.  Content-Length alone isn't very interesting in this
case.

 > 5. Any WSGI application or middleware should not return more data than
>> specified by the Content-Length response header if defined.
>>
>> As long as this is meant as "SHOULD", that's fine. It's not actually a
>> requirement, but rather a suggestion of best practices. KEEP.
>>
>> > 6. The WSGI adapter must not pass on to the server any data above what
>> the Content-Length response header defines if supplied.
>>
>> This is already required by HTTP. If the WSGI gateway doesn't make this
>> happen somehow, it's generating invalid HTTP and that's a bug. Okay to
>> clarify in the spec to ensure people don't miss the requirement when
>> implementing. KEEP.
>>
>
> Good points - I agree with these two, and they can be considered 1.0
> clarifications as well.  After the first four (which I see no reason to
> include) I was probably a little over-inclined to throw these two out
> (especially since I was reading the "should" above as a "must", per your
> proposal).


In this context, maybe 4 is just an extension of these?  Put 4 after 6 and
maybe it'll seem more obvious...?

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread P.J. Eby

At 12:34 PM 11/27/2009 -0500, James Y Knight wrote:

On Nov 27, 2009, at 10:20 AM, P.J. Eby wrote:
> Second, I do not think that the "additional 
guarantees/requirements" can be safely added to a 1.x version, as 
they make it impossible for an app to tell whether it's *really* 
running under 1.1 or under a broken piece of middleware that's 
passing through wsgi.version but not actually providing 1.1-level 
guarantees.  I would therefore suggest that these additional 
guarantees and requirements be deferred to WSGI 2.0.


Okay, let's look at these additional requirements in more detail. I 
see 4 that should be kept, 1 that can be dispensed with, and 1 I'm 
not sure about.


I agree with 2 of your keeps, and remain -0.5 to -1 on the 
others.  See below...



> 1. The 'readline()' function of 'wsgi.input' may optionally take 
a size hint.


Already de-facto required. Leaving it out helps no-one. KEEP.


Fair enough, since it's a MAY.  On the other hand, because it's a 
MAY, it actually *helps* no-one, from a spec compatibility 
POV.  (That is, you have to test whether it's available, so it's no 
different than it not being in the spec to begin with.)


So, putting it in doesn't *hurt*, but neither does it *help*...  so I 
lean towards leaving it to 2.x, where it can actually help.



> 2. The 'wsgi.input' must provide an empty string as end of input 
stream marker.


I don't think this will be a problem. What would WSGI middleware do 
to break this requirement?


It could be reading the original input stream, and replacing it with 
another one.  Not very common I would guess, but it's still possible 
for a piece of perfectly valid 1.0 middleware to fail this 
requirement for 1.1, leading to the condition where you really can't 
tell if you're running valid 1.1 or not.



It was only put in in the first place so that CGI adapters could 
pass through their input stream (which may not ever provide an EOF) 
without having to wrap it. I agree that was a mistake, and should be 
corrected.


I agree...  but only in 2.x.


> 3. The size argument to 'read()' function of 'wsgi.input' would 
be optional and if not supplied the function would return all 
available request content. Thus would make 'wsgi.input' more file 
like as the WSGI specification suggests it is, but isn't really per 
original definition.


This one could be a problem with middleware, and that feature 
shouldn't ever be used, in any case: reading into memory an 
arbitrary amount of data from a client is not a good thing to encourage. OMIT.


Agreed -- even in 2.x it's questionable if not harmful.


> 4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must 
honour the Content-Length response header and must only return from 
the file that amount of content. This would guarantee that using 
wsgi.file_wrapper to return part of a file for byte range requests would work.


Given item #6, I suppose this is actually just a matter of 
efficiency, in case the file wrapper is sent to a middleware rather 
than directly to the wsgi gateway? If it goes directly to the 
gateway, that can of course stop reading by itself. ?undecided?


I don't really see how this one helps anything in 1.x, and so lean 
towards leaving it out.



> 5. Any WSGI application or middleware should not return more data 
than specified by the Content-Length response header if defined.


As long as this is meant as "SHOULD", that's fine. It's not actually 
a requirement, but rather a suggestion of best practices. KEEP.


> 6. The WSGI adapter must not pass on to the server any data above 
what the Content-Length response header defines if supplied.


This is already required by HTTP. If the WSGI gateway doesn't make 
this happen somehow, it's generating invalid HTTP and that's a bug. 
Okay to clarify in the spec to ensure people don't miss the 
requirement when implementing. KEEP.


Good points - I agree with these two, and they can be considered 1.0 
clarifications as well.  After the first four (which I see no reason 
to include) I was probably a little over-inclined to throw these two 
out (especially since I was reading the "should" above as a "must", 
per your proposal).


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread James Y Knight
On Nov 27, 2009, at 10:20 AM, P.J. Eby wrote:
> Second, I do not think that the "additional guarantees/requirements" can be 
> safely added to a 1.x version, as they make it impossible for an app to tell 
> whether it's *really* running under 1.1 or under a broken piece of middleware 
> that's passing through wsgi.version but not actually providing 1.1-level 
> guarantees.  I would therefore suggest that these additional guarantees and 
> requirements be deferred to WSGI 2.0.

Okay, let's look at these additional requirements in more detail. I see 4 that 
should be kept, 1 that can be dispensed with, and 1 I'm not sure about.

> 1. The 'readline()' function of 'wsgi.input' may optionally take a size hint.

Already de-facto required. Leaving it out helps no-one. KEEP.

> 2. The 'wsgi.input' must provide an empty string as end of input stream 
> marker.

I don't think this will be a problem. What would WSGI middleware do to break 
this requirement? It was only put in in the first place so that CGI adapters 
could pass through their input stream (which may not ever provide an EOF) 
without having to wrap it. I agree that was a mistake, and should be corrected. 
KEEP.

> 3. The size argument to 'read()' function of 'wsgi.input' would be optional 
> and if not supplied the function would return all available request content. 
> Thus would make 'wsgi.input' more file like as the WSGI specification 
> suggests it is, but isn't really per original definition.

This one could be a problem with middleware, and that feature shouldn't ever be 
used, in any case: reading into memory an arbitrary amount of data from a 
client is not a good thing to encourage. OMIT.

> 4. The 'wsgi.file_wrapper' supplied by the WSGI adapter must honour the 
> Content-Length response header and must only return from the file that amount 
> of content. This would guarantee that using wsgi.file_wrapper to return part 
> of a file for byte range requests would work.

Given item #6, I suppose this is actually just a matter of efficiency, in case 
the file wrapper is sent to a middleware rather than directly to the wsgi 
gateway? If it goes directly to the gateway, that can of course stop reading by 
itself. ?undecided?

> 5. Any WSGI application or middleware should not return more data than 
> specified by the Content-Length response header if defined.

As long as this is meant as "SHOULD", that's fine. It's not actually a 
requirement, but rather a suggestion of best practices. KEEP.

> 6. The WSGI adapter must not pass on to the server any data above what the 
> Content-Length response header defines if supplied.

This is already required by HTTP. If the WSGI gateway doesn't make this happen 
somehow, it's generating invalid HTTP and that's a bug. Okay to clarify in the 
spec to ensure people don't miss the requirement when implementing. KEEP.

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread P.J. Eby

At 08:42 PM 11/26/2009 -0500, James Y Knight wrote:
I move to bless mod_wsgi's definition of WSGI 1.1 [1] as the 
official definition of WSGI 1.1, which describes how to implement 
WSGI adapters for both Python 2.x and 3.x. It may not be perfect, 
but, it's been implemented twice, and seems ot have no fatal flaws 
(it doesn't do any lossy transforms, so any issues are irritations 
at worst). The basis for this definition is also described in the 
"WSGI 1.0 Ammendments" [2] page.


The definitions as they stand are clear enough to understand and 
implement, but not currently in spec-worthy language. (e.g. it says 
"should" and "may" in a colloquial fashion, but actually means MUST 
in some places and SHOULD in others, as defined by RFC 2119)


Thus, I'd like to suggest that Graham (if he's willing?) should 
reformat the "Definition"/"Ammendments" as an actual diff against 
the current PEP 333. Then, I will recommend adopting that document 
as an actual standard WSGI 1.1, to replace PEP 333.


I'm +1, with a few caveats.  First, as you mention, it needs to be 
spec'd properly.  In particular, it should be clarified that the main 
changes are to *allow byte strings* in certain places where WSGI 1.0 
demands a unicode string w/latin-1 encoding.


Second, I do not think that the "additional guarantees/requirements" 
can be safely added to a 1.x version, as they make it impossible for 
an app to tell whether it's *really* running under 1.1 or under a 
broken piece of middleware that's passing through wsgi.version but 
not actually providing 1.1-level guarantees.  I would therefore 
suggest that these additional guarantees and requirements be deferred 
to WSGI 2.0.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-11-27 Thread Aaron Watters
I second the move, recorded here:

  http://listtree.appspot.com/wsgi2/ICvaujouPxb2gfEhDS_aiw

-- Aaron Watters

--- On Thu, 11/26/09, James Y Knight  wrote:

> From: James Y Knight 
> Subject: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec
> To: "Web SIG" 
> Date: Thursday, November 26, 2009, 8:42 PM
> I move to bless mod_wsgi's definition
> of WSGI 1.1 [1] as the official definition of WSGI 1.1,
> which describes how to implement WSGI adapters for both
> Python 2.x and 3.x. It may not be perfect, but, it's been
> implemented twice, and seems ot have no fatal flaws (it
> doesn't do any lossy transforms, so any issues are
> irritations at worst). The basis for this definition is also
> described in the "WSGI 1.0 Ammendments" [2] page.
> 
> The definitions as they stand are clear enough to
> understand and implement, but not currently in spec-worthy
> language. (e.g. it says "should" and "may" in a colloquial
> fashion, but actually means MUST in some places and SHOULD
> in others, as defined by RFC 2119)
> 
> Thus, I'd like to suggest that Graham (if he's willing?)
> should reformat the "Definition"/"Ammendments" as an actual
> diff against the current PEP 333. Then, I will recommend
> adopting that document as an actual standard WSGI 1.1, to
> replace PEP 333. 
> 
> This discussion has gone on long enough, and it doesn't
> really matter as much to have the perfect API, as it does to
> have a standard.
> 
> James
> 
> [1] http://code.google.com/p/modwsgi/wiki/SupportForPython3X
> [2] http://www.wsgi.org/wsgi/Amendments_1.0
> 
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/arw1961%40yahoo.com
> 
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com