Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-15 Thread Ulrich Eckhardt
On Friday 12 December 2008, Adam Olsen wrote: Only pages like this, which indicate the underlying API is an array of WCHAR: http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx Hmm, true. So even there, the encoding isn't known... char * is just fine. You need only pass a length

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-13 Thread Steven D'Aprano
On Fri, 12 Dec 2008 06:33:28 pm Toshio Kuratomi wrote: Also interesting, if you point your browser at: http://toshio.fedorapeople.org/u/ You should see two other test files. They're both (one-half)(enyei).html but one's encoded in utf-8 and the other in latin-1. For what it's worth,

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Ulrich Eckhardt
On Thursday 11 December 2008, Steve Holden wrote: Ulrich Eckhardt wrote: If readdir() returned Unicode text, people would start taking that for granted. If it returned bytes, just the same. Returning a completely unrelated type will give them enough hint that for this thing they have to

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Ulrich Eckhardt
On Thursday 11 December 2008, Adam Olsen wrote: The simplest solution there is to have windows bytes APIs that return raw UTF-16 bytes (note that windows does NOT guaranteed to be valid unicode, despite being much more likely than on linux). Actually, I'm not aware of this case. I only know

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Stephen J. Turnbull
Toshio Kuratomi writes: Adam Olsen wrote: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread André Malo
* Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes in the address bar, rather than the intended text. Duh! The address bar should contain the URL, which *is* the intended text. The escapes are there for a

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Ulrich Eckhardt
On Friday 12 December 2008, Stephen J. Turnbull wrote: I gather that the BFDL's line on this thread of discussion is that forcing programmers to think about encodings every time they call out to the OS is unacceptable Exactly that is not necessary. for n in os.readdir('.'): f =

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Adam Olsen
On Fri, Dec 12, 2008 at 2:11 AM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes in the address bar, rather than the intended text. Duh! The address bar should contain

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Antoine Pitrou
Curt Hagenlocher curt at hagenlocher.org writes: On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen rhamph at gmail.com wrote: I doubt that UTF-16 is used very much (other than on windows). There's this other obscure platform called Java... ;) Does it have a filesystem?

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Curt Hagenlocher
On Fri, Dec 12, 2008 at 5:06 AM, Antoine Pitrou solip...@pitrou.net wrote: Curt Hagenlocher curt at hagenlocher.org writes: There's this other obscure platform called Java... ;) Does it have a filesystem? No, but it also has to interact with filesystems of possibly invalid or indeterminate

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Antoine Pitrou
Curt Hagenlocher curt at hagenlocher.org writes: No, but it also has to interact with filesystems of possibly invalid or indeterminate encodings. What does java.io do? My point was that Python doesn't have to interact with the Java IO libraries, while it has to interact with the Unix and

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Curt Hagenlocher
On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou solip...@pitrou.net wrote: Curt Hagenlocher curt at hagenlocher.org writes: No, but it also has to interact with filesystems of possibly invalid or indeterminate encodings. What does java.io do? My point was that Python doesn't have to

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Scott Dial
Curt Hagenlocher wrote: On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou solip...@pitrou.net wrote: Curt Hagenlocher curt at hagenlocher.org writes: No, but it also has to interact with filesystems of possibly invalid or indeterminate encodings. What does java.io do? My point was that Python

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Toshio Kuratomi
Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes in the address bar, rather than the intended text. IOW, inconsistent behaviour is a bug, but translating into UTF-8 is not. ;) I think we should let this

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Lennart Regebro
On Fri, Dec 12, 2008 at 16:21, Scott Dial scott+python-...@scottdial.com wrote: See the following email for a summary of existing practice (as of 2004): http://www.mail-archive.com/unic...@unicode.org/msg27352.html Interesting. Quite a lot of them do just drop the undecodable filenames. The

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread glyph
On 02:23 pm, c...@hagenlocher.org wrote: On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou solip...@pitrou.net wrote: Curt Hagenlocher curt at hagenlocher.org writes: No, but it also has to interact with filesystems of possibly invalid or indeterminate encodings. What does java.io do? My

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread André Malo
* Adam Olsen wrote: On Fri, Dec 12, 2008 at 2:11 AM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes in the address bar, rather than the intended text. Duh! The

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread Adam Olsen
On Fri, Dec 12, 2008 at 9:47 PM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: On Fri, Dec 12, 2008 at 2:11 AM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-12 Thread André Malo
* Adam Olsen wrote: On Fri, Dec 12, 2008 at 9:47 PM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: On Fri, Dec 12, 2008 at 2:11 AM, André Malo n...@perlig.de wrote: * Adam Olsen wrote: UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Ulrich Eckhardt
On Wednesday 10 December 2008, Adam Olsen wrote: On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Tuesday 09 December 2008, Adam Olsen wrote: The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Steve Holden
Ulrich Eckhardt wrote: On Wednesday 10 December 2008, Adam Olsen wrote: On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Tuesday 09 December 2008, Adam Olsen wrote: The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Ulrich Eckhardt
On Thursday 11 December 2008, Steve Holden wrote: Ulrich Eckhardt wrote: What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). In order to use this, a programmer will have to convert it explicitly,

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Isaac Morland
On Thu, 11 Dec 2008, Ulrich Eckhardt wrote: On Thursday 11 December 2008, Steve Holden wrote: Ulrich Eckhardt wrote: Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 6:41 AM, Ulrich Eckhardt eckha...@satorlaser.com wrote: On Thursday 11 December 2008, Steve Holden wrote: re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? You receive from readdir() and pass it to stat(), simple

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Stephen J. Turnbull
Steve Holden writes: Ulrich Eckhardt writes: What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). This is presumably unacceptable on the grounds that it will break existing code that does something

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that filesystem names and

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi a.bad...@gmail.com wrote: Adam Olsen wrote: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 11:25 PM, Curt Hagenlocher c...@hagenlocher.org wrote: On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen rha...@gmail.com wrote: I doubt that UTF-16 is used very much (other than on windows). There's this other obscure platform called Java... ;) Sorry, I should have said

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote: A half-broken setup is still a broken setup. Eventually you have to tell people to stop screwing around and pick one encoding. But it's not a broken setup. It's the way the world is because people share things with each other. I doubt that UTF-16 is used very much (other

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote: As a data point, firefox (when pointed at my home dir) DOES skip over garbage files. That's not true. However, it looks like Firefox is actually broken. Take a look at this screenshot: firefox.png That shows a directory with a folder that's not decodable in my utf-8

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-10 Thread Ulrich Eckhardt
On Tuesday 09 December 2008, Adam Olsen wrote: On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Monday 08 December 2008, Adam Olsen wrote: At this point someone suggests we have a type that can store an arbitrary mix of unicode and bytes, so the undecodable

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-10 Thread Adam Olsen
On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Tuesday 09 December 2008, Adam Olsen wrote: The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we have no good solutions. Instead we're trying to find

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Anders J. Munch
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: try: files = os.listdir(somedir, errors = strict) except OSError as e: log(verbose error message that includes somedir and e) files = os.listdir(somedir) Instead of a codecs error handler name, how about a callback for

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Nick Coghlan
Glenn Linderman wrote: On approximately 12/8/2008 9:30 AM, came the following characters from the keyboard of [EMAIL PROTECTED]: PS: I'd like to see a similar warning issued when an access attempt is made through os.environ to a variable that cannot be decoded. And argv ? Seems like the

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread M.-A. Lemburg
On 2008-12-09 09:41, Anders J. Munch wrote: On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: try: files = os.listdir(somedir, errors = strict) except OSError as e: log(verbose error message that includes somedir and e) files = os.listdir(somedir) Instead of a codecs

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread André Malo
* M.-A. Lemburg wrote: On 2008-12-09 09:41, Anders J. Munch wrote: On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: try: files = os.listdir(somedir, errors = strict) except OSError as e: log(verbose error message that includes somedir and e) files =

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Anders J. Munch
M.-A. Lemburg wrote: Well, this is not too far away from just putting the whole decoding logic into the application directly: files = [filename.decode(filesystemencoding, errors='warnreplace') for filename in os.listdir(dir)] (or os.listdirb() if that's where the discussion is

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Ulrich Eckhardt
On Monday 08 December 2008, Adam Olsen wrote: At this point someone suggests we have a type that can store an arbitrary mix of unicode and bytes, so the undecodable portions stay in their original form. :P Well, not an arbitrary mix, but a type that just stores whatever comes from the system

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Adam Olsen
On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Monday 08 December 2008, Adam Olsen wrote: At this point someone suggests we have a type that can store an arbitrary mix of unicode and bytes, so the undecodable portions stay in their original form. :P Well, not an

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Toshio Kuratomi
James Y Knight wrote: On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote: The typical application will just obliviously use os.listdir(dir) and get the default elide-and-warn behaviour for un-decodable names. That rare special application I guess this is a new definition of rare special

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Stephen J. Turnbull
Glenn Linderman writes: significantly seems to be the only word at question; it seems that there are a fair number of validation checks that could be performed; the numeric part of UTF-8 decoding is just a sequence of shifts, masks, and ORs, so can be coded pretty tightly in C or

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Ulrich Eckhardt
On Friday 05 December 2008, James Y Knight wrote: On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote: Using the byte variant is equally fubar, because e.g. on MS Windows it is not supported, except through a very lossy roundtrip through the locale's codepage, limiting your functionality.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Glenn Linderman
On approximately 12/8/2008 12:57 AM, came the following characters from the keyboard of Stephen J. Turnbull: Internal decoding is (or should be) an oxymoron. Why would your software be passing around text in any format other than internal? So decoding will happen (a) on I/O, which is itself

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Nick Coghlan
Terry Reedy wrote: This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people need to better anticipate the future,

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Ulrich Eckhardt
On Sunday 07 December 2008, Guido van Rossum wrote: My problem with raising exceptions *by default* when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all the time with the 2.x Unicode API, where

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg
On 2008-12-06 01:48, Nick Coghlan wrote: You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on. Non-filesystem related apps have no business trying to deal with insane filenames. This is not entirely true: OSes, shells, and

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread rdmurray
On Sun, 7 Dec 2008 at 13:33, Guido van Rossum wrote: My problem with raising exceptions *by default* when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all I think Nick Coghlan's suggestion of

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Bill Janssen
Nick Coghlan [EMAIL PROTECTED] wrote: - I think the binary and Unicode APIs should be available (and fully functional) on all platforms (including Windows) so that app developers don't create portability problems for themselves when they make the decision as to which API to use +1 I'm

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Terry Reedy
Nick Coghlan wrote: Terry Reedy wrote: This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people need to better

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: Guido van Rossum wrote: On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy [EMAIL PROTECTED] wrote: Toshio Kuratomi wrote: - If this is true, a definition of os.listdir(type 'str') that would better meet programmer expectation

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum
On Mon, Dec 8, 2008 at 10:34 AM, [EMAIL PROTECTED] wrote: On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote: And the decoding problems don't pass silently either - they just get emitted as a warning by default instead of causing the application to crash. Do they get automatically logged?

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Scott Dial
Guido van Rossum wrote: On Mon, Dec 8, 2008 at 10:34 AM, [EMAIL PROTECTED] wrote: On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote: And the decoding problems don't pass silently either - they just get emitted as a warning by default instead of causing the application to crash. Do they get

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Bugbee, Larry
I'm perhaps biased here; most of my Python programs don't have user interfaces, because they don't talk to people, they talk to other programs. The binary APIs for the OS are essential. I use and deeply appreciate all the string handling features in Python, particularly its firm grip on

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread rdmurray
On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: On Mon, Dec 8, 2008 at 10:34 AM, [EMAIL PROTECTED] wrote: I'm in favor of an option to control what happens. I just really really don't want the _default_ to be ignore. Defaulting to a warning is fine with me, as would be defaulting to a

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum
On Mon, Dec 8, 2008 at 12:07 PM, [EMAIL PROTECTED] wrote: On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: On Mon, Dec 8, 2008 at 10:34 AM, [EMAIL PROTECTED] wrote: I'm in favor of an option to control what happens. I just really really don't want the _default_ to be ignore.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg
On 2008-12-08 19:26, Guido van Rossum wrote: On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: Here is a possible use case: I want filenames as 3.0 strings and I anticipate no problems at present but, as you say above, something might happen years in the future. I am using

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes: Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode escapes, private code points, etc. as seen fit by the application. I'd argue that such fancy round-trip safe error

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg
On 2008-12-08 21:45, Antoine Pitrou wrote: M.-A. Lemburg mal at egenix.com writes: Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode escapes, private code points, etc. as seen fit by the application. I'd

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Nick Coghlan
Terry Reedy wrote: Nick Coghlan wrote: Terry Reedy wrote: This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen
On Mon, Dec 8, 2008 at 1:45 PM, Antoine Pitrou [EMAIL PROTECTED] wrote: M.-A. Lemburg mal at egenix.com writes: Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode escapes, private code points, etc. as seen

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Antoine Pitrou
Adam Olsen rhamph at gmail.com writes: Except they're clearly NOT part of the unicode spec. This is always the same discussion going in circles. I know they're not part of the unicode spec, but practicality beats purity and if the said error handler comes with an appropriate warning in the

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen
On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2008-12-08 21:45, Antoine Pitrou wrote: M.-A. Lemburg mal at egenix.com writes: Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Toshio Kuratomi
Guido van Rossum wrote: On Mon, Dec 8, 2008 at 12:07 PM, [EMAIL PROTECTED] wrote: On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: But I'm happy with just issuing a warning by default. That would mean it doesn't fail silently, but neither does it crash. Seems like the best compromise

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg
On 2008-12-08 22:32, Adam Olsen wrote: On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2008-12-08 21:45, Antoine Pitrou wrote: M.-A. Lemburg mal at egenix.com writes: Such application specific error handlers could then also apply whatever fancy round-trip safe

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen
On Mon, Dec 8, 2008 at 1:12 PM, Guido van Rossum [EMAIL PROTECTED] wrote: On Mon, Dec 8, 2008 at 12:07 PM, [EMAIL PROTECTED] wrote: But I'm happy with just issuing a warning by default. That would mean it doesn't fail silently, but neither does it crash. Seems like the best compromise with

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg
On 2008-12-08 22:39, Victor Stinner wrote: ('strict', 'ignore', 'replace', 'xmlcharrefreplace') replace (or xmlcharrefreplace) is just useless because you will not be unable to open or rename the file... You just know that there is a strange file in the directory. Right, but that's

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen
On Mon, Dec 8, 2008 at 2:44 PM, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2008-12-08 22:32, Adam Olsen wrote: On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2008-12-08 21:45, Antoine Pitrou wrote: M.-A. Lemburg mal at egenix.com writes: Such application specific

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Terry Reedy
M.-A. Lemburg wrote: On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy [EMAIL PROTECTED] wrote: try: files = os.listdir(somedir, errors = strict) except OSError as e: log(verbose error message that includes somedir and e) files = os.listdir(somedir) If that error parameter is the same as in

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Glenn Linderman
On approximately 12/8/2008 9:30 AM, came the following characters from the keyboard of [EMAIL PROTECTED]: If warnings were emitted, then files would not be silently ignored, yet the program could still be used. Yep, this is sounding useful. PS: I'd like to see a similar warning issued

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Hagen Fürstenau
If the Unicode APIs only have correct unicode, sure. If not you'll get errors translating to UTF-8 (and the byte APIs are supposed to pass bad names through unaltered.) Kinda ironic, no? As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Adam Olsen
On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau [EMAIL PROTECTED] wrote: As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates because Python doesn't care about them. So both the Unicode API and the binary API would be fail-safe on Windows.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Toshio Kuratomi
[EMAIL PROTECTED] wrote: On 06:07 am, [EMAIL PROTECTED] wrote: Most apps aren't file managers or ftp clients but when they interact with files (for instance, a file selection dialog) they need to be able to show the user all the relevant files. So on an app-by-app basis the need for this

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Michael Urman
On Sun, Dec 7, 2008 at 11:35, Adam Olsen [EMAIL PROTECTED] wrote: http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 No. Unicode *requires* them to be treated as errors. If you want to pass them through then you're creating a custom encoding... which you might argue for in

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Adam Olsen
On Sun, Dec 7, 2008 at 11:18 AM, Michael Urman [EMAIL PROTECTED] wrote: On Sun, Dec 7, 2008 at 11:35, Adam Olsen [EMAIL PROTECTED] wrote: http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 No. Unicode *requires* them to be treated as errors. If you want to pass them through

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Terry Reedy
Toshio Kuratomi wrote: - If this is true, a definition of os.listdir(type 'str') that would better meet programmer expectation would be: Give me all files in a directory with the output as str type. The definition of os.listdir(type 'bytes') would be Give me all files in a directory with the

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Guido van Rossum
On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy [EMAIL PROTECTED] wrote: Toshio Kuratomi wrote: - If this is true, a definition of os.listdir(type 'str') that would better meet programmer expectation would be: Give me all files in a directory with the output as str type. The definition of

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Nick Coghlan
Terry Reedy wrote: Toshio Kuratomi wrote: - If this is true, a definition of os.listdir(type 'str') that would better meet programmer expectation would be: Give me all files in a directory with the output as str type. The definition of os.listdir(type 'bytes') would be Give me all files

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Greg Ewing
Nick Coghlan wrote: For binary wrappers around the Windows Unicode APIs, I was thinking specifically of using UTF-8, since that should be able to encode anything the Unicode APIs can handle. Why shouldn't the binary interface just expose the raw utf16 as bytes? -- Greg

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Terry Reedy
Guido van Rossum wrote: On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy [EMAIL PROTECTED] wrote: Toshio Kuratomi wrote: - If this is true, a definition of os.listdir(type 'str') that would better meet programmer expectation would be: Give me all files in a directory with the output as str type.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Glenn Linderman
On approximately 12/7/2008 10:56 AM, came the following characters from the keyboard of Adam Olsen: You might receive a UTF-8 encoded file name from a malicious user, check if it contains something dangerous (like ../../../../../etc/password), then decode it. If your decoder isn't compliant

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Stephen J. Turnbull
Glenn Linderman writes: But if you are interested in checking for security issues, shouldn't you _first_ decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Glenn Linderman
On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: Glenn Linderman writes: But if you are interested in checking for security issues, shouldn't you _first_ decode into some canonical form, Yes. That's all that is being asked

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Adam Olsen
On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: Glenn Linderman writes: But if you are interested in checking for security issues, shouldn't you _first_

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Adam Olsen
On Sun, Dec 7, 2008 at 11:04 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 12/7/2008 9:11 PM, came the following characters from the keyboard of Adam Olsen: On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman [EMAIL PROTECTED] wrote: Once upon a time I did write an unvalidated

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Oleg Broytmann
On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote: On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote: You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on. Non-filesystem related apps have no business trying to deal with

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Oleg Broytmann
On Sat, Dec 06, 2008 at 12:03:55PM +1100, Steven D'Aprano wrote: I'd rather have the Python API report errors then silence them, at least by default. +1 for encoding errors by default. Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Oleg Broytmann
On Sat, Dec 06, 2008 at 02:22:29AM +0100, Martin v. L?wis wrote: And environment variables, command line arguments, and file names are not bytes, but characters. There is no such thing as plain text! If you say these are characters you must also name the encoding for them.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Toshio Kuratomi
Nick Coghlan wrote: Toshio Kuratomi wrote: Nonsense. A program can do tons of things with a non-decodable filename. Where it's limited is non-decodable filedata. You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on.

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Guido van Rossum
On Fri, Dec 5, 2008 at 10:18 PM, Bugbee, Larry [EMAIL PROTECTED] wrote: There has been some discussion here that users should use the str or byte function variant based on what is relevant to their system, for example when getting a list of file names or opening a file. That thought process

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Guido van Rossum
On Fri, Dec 5, 2008 at 8:57 PM, Tres Seaver [EMAIL PROTECTED] wrote: Amen! the idea that paths, environment varioables, and stuff pulled off of sockets can be treated as text rather than strings is just wishful thinking. Unfortunately most of the programmers of the world *do* think that

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Toshio Kuratomi
Bugbee, Larry wrote: There has been some discussion here that users should use the str or byte function variant based on what is relevant to their system, for example when getting a list of file names or opening a file. That thought process really doesn't do much for those of us that write

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread glyph
On 02:34 pm, [EMAIL PROTECTED] wrote: On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote: On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote: You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on. Non-filesystem related apps

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Guido van Rossum
On Sat, Dec 6, 2008 at 10:53 AM, [EMAIL PROTECTED] wrote: On 02:34 pm, [EMAIL PROTECTED] wrote: I agree 100%. Russian Unix users use at least 5 different encodings (koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and iso-8859-5 are less frequent). I have an FTP server with some

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Nick Coghlan
Toshio Kuratomi wrote: Note 2: If there isn't a parallel API on all platforms, for instance, Guido's proposal to not have os.environb on Windows, then you'll still have to have a platform specific check. (Likely you should try to access os.evironb in this instance and if it doesn't exist, use

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Antoine Pitrou
Nick Coghlan ncoghlan at gmail.com writes: If the binary APIs are missing from a major platform (i.e. Windows) then the choice to use them brings with it a major cross-platform portability problem that should really be handled by the standard library. +1 I might also add that providing

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread André Malo
* Nick Coghlan wrote: Toshio Kuratomi wrote: Note 2: If there isn't a parallel API on all platforms, for instance, Guido's proposal to not have os.environb on Windows, then you'll still have to have a platform specific check. (Likely you should try to access os.evironb in this instance

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Nick Coghlan
André Malo wrote: While on Windows: - underlying OS API uses Unicode - Unicode API just passes values straight through - binary API uses the system encoding to decode bytes names and values to be passed to the OS API and to encode Unicode names and values received from the OS API Now that

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread Aahz
On Sun, Dec 07, 2008, Nick Coghlan wrote: If the binary APIs are missing from a major platform (i.e. Windows) then the choice to use them brings with it a major cross-platform portability problem that should really be handled by the standard library. +1 -- Aahz ([EMAIL PROTECTED])

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-06 Thread glyph
On 06:07 am, [EMAIL PROTECTED] wrote: Guido van Rossum wrote: On Sat, Dec 6, 2008 at 10:53 AM, [EMAIL PROTECTED] wrote: I find it interesting to note that the only users in this discussion who actually have these problems in real life all have this attitude. For file managers and

  1   2   >