On Dec 9, 2007 7:56 PM, Graham Dumpleton <[EMAIL PROTECTED]> wrote: > On 09/12/2007, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > On Dec 8, 2007 12:37 AM, Graham Dumpleton <[EMAIL PROTECTED]> wrote: > > > On 08/12/2007, Phillip J. Eby <[EMAIL PROTECTED]> wrote: > > > > * When running under Python 3, servers MUST provide a text stream for > > > > wsgi.errors > > > > > > In Python 3, what happens if user code attempts to output to a text > > > stream a byte string? Ie., what would be displayed? > > > > Nothing. You get a TypeError. > > Hmmm, this in itself could be quite a pain for existing code where > people have added debug code to print out details from request headers > (if now to be passed as bytes), or part of the request content.
Sorry, I was just talking about the write() method on a text stream. The print() function in 3.0 will print the repr() of the bytes. Example: Python 3.0a2 (py3k, Dec 10 2007, 09:38:42) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a = b"xyz" >>> print(a) b'xyz' >>> b = b"abc\377def" >>> print(b) b'abc\xffdef' >>> (Note that this works because print() always calls str() on the argument and bytes.str is defined to be the same as bytes.repr.) > What is the suggested way of best dumping out bytes for debugging > purposes so one does not have to worry about encoding issues, just use > repr()? Just use print(). > > > Also, if wsgi.errors is a text stream, presume that if a WSGI adapter > > > has to internally map this to a C char* like API for logging that it > > > would need to apply standard Python encoding to yield usable char* > > > string for output. > > > > The encoding can/must be specified per text stream. > > But what should the encoding associated with the wsgi.errors stream be? Depends on the platform and your requirements. > If code which outputs text to wsgi.errors can use any valid Unicode > character, if one sets it to US-ASCII encoding, then chance that > logging output will fail because of characters not being valid in that > character set. If one instead uses UTF-8, then potentially have issues > where that byte string coming out other end of text stream is passed > to C API functions. Issues might arise here where C API not expecting > variable width character encoding. > > I'll freely admit I am not across all this Unicode encode/decode stuff > as I don't generally have to deal with foreign languages, but seems to > be a few missing details in this area which need to be filled out for > a modified WSGI specification. The goal of this part of Py3k is to make it more obvious when you haven't thought through your encoding issues enough by failing as soon as (encoded) bytes meet (decoded) characters. Of course, you can still run into delayed trouble by using an inappropriate encoding, which only shows up when there is an actual encoding or decoding error; but at least you will have carefully distinguished between encoded and decoded text throughout your program, so the fix is now to change the encoding rather than having to restructure your code to properly separate encoded and decoded text. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com