On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> What you are saying is that if you write a 10-line script that claims
> Unicode conformance, you are responsible for the Unicode-correctness of
> all modules you call implicitly as well as that of the Python interpreter.
If text files ar
Rauli Ruohonen writes:
> What I meant is that the stdlib should only have string operations
> that effectively work on (1) sequences of code units or (2)
> sequences of code points, and that the choice between these two
> should be made reasonably.
I think we've reached a dead end. AIUI, tha
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Rauli Ruohonen writes:
>
> > What I meant is that the stdlib should only have string operations
> > that effectively work on (1) sequences of code units or (2)
> > sequences of code points, and that the choice between these two
> > sh
> I think we've reached a dead end. AIUI, that's a matter for a PEP,
> and the window for Python 3 is closed. I'm pretty sure that Python 3
> is going to have sequences of code units only (I know, Guido said
> "code points", but I doubt he's read TR#17), except that people will
> sneak in some UT
>> Until one or more of the senior developers says otherwise, I'm going
>> to assume that.
>
> Yeah, what's the difference between code units and points?
A code unit is the atomic base in some encoding. It is a single byte
in most encodings, but a 16-bit quantity in UTF-16 (and a 32-bit
quantity
On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> >> Until one or more of the senior developers says otherwise, I'm going
> >> to assume that.
> >
> > Yeah, what's the difference between code units and points?
>
> A code unit is the atomic base in some encoding. It is a single byte
> in mo
I couldn't get this exact patch to apply, but I implemented something
equivalent in the py3kstruni branch. See revisions 55964 and 55965.
Thanks for the suggestion!
--Guido
On 6/12/07, Ron Adam <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > On 6/7/07, "Martin v. Löwis" <[EMAIL PROTECTED
> Thanks for clearing that up. It sounds like we really use code units,
> not code points (except when building with the 4-byte Unicode option,
> when they are equivalent). Is there anywhere were we use code points,
> apart from the UTF-8 codecs, which encode properly matched surrogate
> pairs as a
On 6/13/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > A code point is something that has a 1:1 relationship with a logical
> > character (in particular, a Unicode character).
and
> > A code unit is the atomic base in some encoding.
Guido van Rossum wrote:
> I couldn't get this exact patch to apply, but I implemented something
> equivalent in the py3kstruni branch. See revisions 55964 and 55965.
> Thanks for the suggestion!
This is actually closer to how I started to do it, but I wasn't sure if it
would catch everything.
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
>
>
> Guido van Rossum wrote:
> > I couldn't get this exact patch to apply, but I implemented something
> > equivalent in the py3kstruni branch. See revisions 55964 and 55965.
> > Thanks for the suggestion!
>
> This is actually closer to how I started
On 6/12/2007 6:30 PM, Phillip J. Eby wrote:
>> import imp, os, sys
>> from pkgutil import ImpImporter
>>
>> suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
>>
>> class CachedImporter(ImpImporter):
>> def __init__(self, path):
>> if not os.path.i
Guido van Rossum wrote:
> On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
>>
>>
>> Guido van Rossum wrote:
>> > I couldn't get this exact patch to apply, but I implemented something
>> > equivalent in the py3kstruni branch. See revisions 55964 and 55965.
>> > Thanks for the suggestion!
>>
>> This
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
> Looking at the overall structure of os.py makes me think the platform
> specific code could be abstracted out a bit further. Possibly have one
> public "platform" module (or package) that is an alias or built from
> private _platform package files.
On 6/12/07, Giovanni Bajo <[EMAIL PROTECTED]> wrote:
On 6/12/2007 6:30 PM, Phillip J. Eby wrote:
>> import imp, os, sys
>> from pkgutil import ImpImporter
>>
>> suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
>>
>> class CachedImporter(ImpImporter):
>> de
Guido van Rossum wrote:
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
Looking at the overall structure of os.py makes me think the platform
specific code could be abstracted out a bit further. Possibly have one
public "platform" module (or package) that is an alias or built from
private _pl
> Yes. The BOM mark, for one.
Actually, the BOM *is* a character: ZERO WIDTH NO-BREAK SPACE,
character class Cf. This function of the code point (as a character)
is deprecated, though.
> There are also some that are explicitly not characters.
> (U+FD00..U+FDEF)
??? U+FD00 is ARABIC LIGATURE HAH
17 matches
Mail list logo