Josiah Carlson wrote:
> Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>>
>> Le mercredi 13 septembre 2006 à 09:41 -0700, Josiah Carlson a écrit :
>>> And is generally ignored, as per unicode spec; it's a "zero width
>>> non-breaking space" - an invisible character with no effect on wrapping
>>> or othe
Antoine Pitrou wrote:
> Hi,
>
> Le mercredi 13 septembre 2006 à 16:14 -0700, Josiah Carlson a écrit :
>> In any case, I believe that the above behavior is correct for the
>> context. Why? Because utf-8 has no endianness, its 'generic' decoding
>> spelling of 'utf-8' is analagous to all three 'ut
Martin v. Löwis wrote:
> Jim Jewett schrieb:
>> Simply delegate such methods to a hidden per-encoding subclass.
>>
>> The UTF-8 methods will indeed be complex, unless the solution is
>> simply "someone called indexing/slicing/len, so I have to recode after
>> all."
>>
>> The Latin-1 encoding will h
Nick Coghlan <[EMAIL PROTECTED]> writes:
> Only the first such call on a given string, though - the idea
> is to use lazy decoding, not to avoid decoding altogether.
> Most manipulations (len, indexing, slicing, concatenation, etc)
> would require decoding to at least UCS-2 (or perhaps UCS-4).
Si
> Only the first such call on a given string, though - the idea is to use
> lazy
> decoding, not to avoid decoding altogether. Most manipulations (len,
> indexing,
> slicing, concatenation, etc) would require decoding to at least UCS-2 (or
> perhaps UCS-4).
My two cents:
For len() you can comput
On 9/14/06, Talin <[EMAIL PROTECTED]> wrote:
> I've been reading this thread (and the ones that spawned it), and
> there's something about it that's been nagging at me for a while, which
> I am going to attempt to articulate.
[...]
> Any given Python program that I write is going to know *something
David Hopwood <[EMAIL PROTECTED]> writes:
> You're correct about the use of a BOM as a signature. All
> Unicode-conformant applications should accept this use of a BOM in
> UTF-8 (although they need not generate it); the standard is quite
> clear on that.
When a program generates a list of filena
Talin wrote:
>> My point was different : most programmers are not at your level (or
>> Paul's level, etc.) when it comes to Unicode knowledge. Py3k's str type
>> is supposed to be an abstracted textual type to make it easy to write
>> unicode-friendly applications (isn't it?).
>
> The basic contro
Blake Winton <[EMAIL PROTECTED]> wrote:
[snip]
> Um, what more data do we need for this use-case? I'm not going to
> suggest an API, other than it would be nice if I didn't have to manually
> figure out/hard code all the encodings. (It's my belief that I will
> currently have to do that, or a
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> wrote:
> Nick Coghlan <[EMAIL PROTECTED]> writes:
>
> > Only the first such call on a given string, though - the idea
> > is to use lazy decoding, not to avoid decoding altogether.
> > Most manipulations (len, indexing, slicing, concatenation, etc)
On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
> "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> wrote:
> > Nick Coghlan <[EMAIL PROTECTED]> writes:
> >
> > > Only the first such call on a given string, though - the idea
> > > is to use lazy decoding, not to avoid decoding altogether.
> >
Josiah Carlson wrote:
> Blake Winton <[EMAIL PROTECTED]> wrote:
>> I'm not going to
>> suggest an API, other than it would be nice if I didn't have to manually
>> figure out/hard code all the encodings. (It's my belief that I will
>> currently have to do that, or at least special-case XML, to r
As a somewhat aside: for XML encoding detection:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/363841
Paul Prescod
___
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail
Blake Winton <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > Blake Winton <[EMAIL PROTECTED]> wrote:
> >> I'm not going to
> >> suggest an API, other than it would be nice if I didn't have to manually
> >> figure out/hard code all the encodings. (It's my belief that I will
> >> currentl
On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> So don't save it with a BOM and add a Python coding: directive to the
> second line. Python and bash comments just happen to have the same #
> delimiter, and if your editor doesn't suck, then it should understand
> such a directive.
However,
"Bob Ippolito" <[EMAIL PROTECTED]> wrote:
> The argument for UTF-8 is probably interop efficiency. Lots of C
> libraries, file formats, and wire protocols use UTF-8 for interchange.
> Verifying the validity of UTF-8 during string creation isn't that big
> of a deal.
Indeed, UTF-8 validation/creat
On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
> "Bob Ippolito" <[EMAIL PROTECTED]> wrote:
> > The argument for UTF-8 is probably interop efficiency. Lots of C
> > libraries, file formats, and wire protocols use UTF-8 for interchange.
> > Verifying the validity of UTF-8 during string creat
For what it's worth: in .NET, everything defaults to UTF-8, whether
reading or writing. No BOM is generated when creating a new file.
http://msdn2.microsoft.com/en-us/library/system.io.file.createtext.aspx
Java defaults to a "default character encoding", which on Windows is
the system's ANSI e
Nick Coghlan schrieb:
> Only the first such call on a given string, though - the idea is to use
> lazy decoding, not to avoid decoding altogether. Most manipulations
> (len, indexing, slicing, concatenation, etc) would require decoding to
> at least UCS-2 (or perhaps UCS-4).
Ok. Then my objection
Josiah Carlson wrote:
> Any sane person uses os.stat(f.name) or os.fstat(f.fileno()), unless
> they want to seek to the end of the file for later writing or expected
> reading of data yet-to-be-written.
os.fstat(f.fileno()).st_size doesn't work for file-like objects.
Goodbye unit testing with S
On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> With luck, your editor should also allow for the
> non-writing of the BOM on utf-8 save (given certain conditions). If not,
> contact the author(s) and request that feature.
And hope they didn't write it in a language that doesn't let them
c
"Paul Moore" <[EMAIL PROTECTED]> wrote:
> On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > So don't save it with a BOM and add a Python coding: directive to the
> > second line. Python and bash comments just happen to have the same #
> > delimiter, and if your editor doesn't suck, then i
Paul Moore wrote:
> On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
>>So don't save it with a BOM and add a Python coding: directive to the
>>second line. Python and bash comments just happen to have the same #
>>delimiter, and if your editor doesn't suck, then it should understand
>>such
Anders J. Munch wrote:
> (note the potential race condition in
> f=mmap.mmap(f.fileno(),os.fstat(f.fileno(.
Not sure anything could be done about that. Even if
there were an mmap-this-file-however-big-it-is call,
the size of the file could still change *after*
you'd mapped it.
--
Greg
___
"Anders J. Munch" <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > You were also talking about buffering writes to reduce the overhead of
> > the underlying seeks and tells because of apparent "optimizations" you
> > wanted to make. Here is a data integrity optimization you can make for
>
25 matches
Mail list logo