Re: [Python-3000] String comparison

2007-06-13 Thread Rauli Ruohonen
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > What you are saying is that if you write a 10-line script that claims > Unicode conformance, you are responsible for the Unicode-correctness of > all modules you call implicitly as well as that of the Python interpreter. If text files ar

Re: [Python-3000] String comparison

2007-06-13 Thread Stephen J. Turnbull
Rauli Ruohonen writes: > What I meant is that the stdlib should only have string operations > that effectively work on (1) sequences of code units or (2) > sequences of code points, and that the choice between these two > should be made reasonably. I think we've reached a dead end. AIUI, tha

Re: [Python-3000] String comparison

2007-06-13 Thread Guido van Rossum
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > > What I meant is that the stdlib should only have string operations > > that effectively work on (1) sequences of code units or (2) > > sequences of code points, and that the choice between these two > > sh

Re: [Python-3000] String comparison

2007-06-13 Thread Martin v. Löwis
> I think we've reached a dead end. AIUI, that's a matter for a PEP, > and the window for Python 3 is closed. I'm pretty sure that Python 3 > is going to have sequences of code units only (I know, Guido said > "code points", but I doubt he's read TR#17), except that people will > sneak in some UT

Re: [Python-3000] String comparison

2007-06-13 Thread Martin v. Löwis
>> Until one or more of the senior developers says otherwise, I'm going >> to assume that. > > Yeah, what's the difference between code units and points? A code unit is the atomic base in some encoding. It is a single byte in most encodings, but a 16-bit quantity in UTF-16 (and a 32-bit quantity

Re: [Python-3000] String comparison

2007-06-13 Thread Guido van Rossum
On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >> Until one or more of the senior developers says otherwise, I'm going > >> to assume that. > > > > Yeah, what's the difference between code units and points? > > A code unit is the atomic base in some encoding. It is a single byte > in mo

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Guido van Rossum
I couldn't get this exact patch to apply, but I implemented something equivalent in the py3kstruni branch. See revisions 55964 and 55965. Thanks for the suggestion! --Guido On 6/12/07, Ron Adam <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > On 6/7/07, "Martin v. Löwis" <[EMAIL PROTECTED

Re: [Python-3000] String comparison

2007-06-13 Thread Martin v. Löwis
> Thanks for clearing that up. It sounds like we really use code units, > not code points (except when building with the 4-byte Unicode option, > when they are equivalent). Is there anywhere were we use code points, > apart from the UTF-8 codecs, which encode properly matched surrogate > pairs as a

Re: [Python-3000] String comparison

2007-06-13 Thread Jim Jewett
On 6/13/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > A code point is something that has a 1:1 relationship with a logical > > character (in particular, a Unicode character). and > > A code unit is the atomic base in some encoding.

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Ron Adam
Guido van Rossum wrote: > I couldn't get this exact patch to apply, but I implemented something > equivalent in the py3kstruni branch. See revisions 55964 and 55965. > Thanks for the suggestion! This is actually closer to how I started to do it, but I wasn't sure if it would catch everything.

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Guido van Rossum
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: > > > Guido van Rossum wrote: > > I couldn't get this exact patch to apply, but I implemented something > > equivalent in the py3kstruni branch. See revisions 55964 and 55965. > > Thanks for the suggestion! > > This is actually closer to how I started

Re: [Python-3000] Pre-PEP on fast imports

2007-06-13 Thread Giovanni Bajo
On 6/12/2007 6:30 PM, Phillip J. Eby wrote: >> import imp, os, sys >> from pkgutil import ImpImporter >> >> suffixes = set(ext for ext,mode,typ in imp.get_suffixes()) >> >> class CachedImporter(ImpImporter): >> def __init__(self, path): >> if not os.path.i

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Ron Adam
Guido van Rossum wrote: > On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: >> >> >> Guido van Rossum wrote: >> > I couldn't get this exact patch to apply, but I implemented something >> > equivalent in the py3kstruni branch. See revisions 55964 and 55965. >> > Thanks for the suggestion! >> >> This

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Guido van Rossum
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: > Looking at the overall structure of os.py makes me think the platform > specific code could be abstracted out a bit further. Possibly have one > public "platform" module (or package) that is an alias or built from > private _platform package files.

Re: [Python-3000] Pre-PEP on fast imports

2007-06-13 Thread Brett Cannon
On 6/12/07, Giovanni Bajo <[EMAIL PROTECTED]> wrote: On 6/12/2007 6:30 PM, Phillip J. Eby wrote: >> import imp, os, sys >> from pkgutil import ImpImporter >> >> suffixes = set(ext for ext,mode,typ in imp.get_suffixes()) >> >> class CachedImporter(ImpImporter): >> de

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-13 Thread Ron Adam
Guido van Rossum wrote: On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: Looking at the overall structure of os.py makes me think the platform specific code could be abstracted out a bit further. Possibly have one public "platform" module (or package) that is an alias or built from private _pl

Re: [Python-3000] String comparison

2007-06-13 Thread Martin v. Löwis
> Yes. The BOM mark, for one. Actually, the BOM *is* a character: ZERO WIDTH NO-BREAK SPACE, character class Cf. This function of the code point (as a character) is deprecated, though. > There are also some that are explicitly not characters. > (U+FD00..U+FDEF) ??? U+FD00 is ARABIC LIGATURE HAH