Re: [Python-3000] Help on text editors

2006-10-03 Thread David Hopwood
Martin v. Löwis wrote: > David Hopwood schrieb: > >>>If you have access to "German Windows XP", "Japanese Windows XP", >> >>Since Win2K there is actually no such thing, from a technical point of view -- >>just Win2K or WinXP with a G

Re: [Python-3000] How will unicode get used?

2006-09-25 Thread David Hopwood
Paul Prescod wrote: > On 9/25/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > >> As David Hopwood pointed out, to be fully correct, you already have to >> create a custom function even with bmp characters, because of >> decomposed characters. (Example: Representi

Re: [Python-3000] How will unicode get used?

2006-09-21 Thread David Hopwood
Fredrik Lundh wrote: > David Hopwood wrote: > >>For example, "ö" can be represented either as the precomposed character >>U+00F6, >>or as "o" followed by a combining diaeresis (U+006F U+0308). > > normalization is a good thing, though: > >

Re: [Python-3000] How will unicode get used?

2006-09-20 Thread David Hopwood
ional texts necessarily differs from working with ASCII. There is no excuse for any programmer doing text processing not to have read it. Should we nevertheless try to avoid making the use of Unicode strings unnecessarily difficult for people who have minimal knowledge

Re: [Python-3000] BOM handling

2006-09-14 Thread David Hopwood
yond the wit of those editor developers to talk to each other, or to just unilaterally support the other editor's format as well as their own. -- David Hopwood <[EMAIL PROTECTED]> ___ Python-3000 mailing list Python-3000@python.org http://ma

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-13 Thread David Hopwood
some locales recently added in Windows 2000/XP, where there was no compatibility constraint to use a non-Unicode encoding. You're correct about the use of a BOM as a signature. All Unicode-conformant applications should accept this use of a BOM in UTF-8 (although they need not generate it); the s

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-11 Thread David Hopwood
ams, and this is an advantage over algorithms that don't work for streams. -- David Hopwood <[EMAIL PROTECTED]> ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-11 Thread David Hopwood
Paul Prescod wrote: > On 9/10/06, David Hopwood <[EMAIL PROTECTED]> wrote: > >> ... if you think that guessing based on content is a good idea -- I >> don't. In any case, such guessing necessarily depends on the expected file >> format, so it should be done

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-10 Thread David Hopwood
Paul Prescod wrote: > Maybe the guessing algorithm should read the WHOLE FILE. That wouldn't work for streams (e.g. stdin). The algorithm I gave does work for streams, provided that they have a push-back buffer of at least 4 bytes. -- David Hopwood <[EMAI

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-10 Thread David Hopwood
Josiah Carlson wrote: > David Hopwood <[EMAIL PROTECTED]> wrote: > >>Here is a very simple, reasonably (although not completely) safe, and much >>more predictable guessing algorithm, based on a generalization of >><http://www.w3.org/TR/REC-xml/#sec-guessing>

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-09-10 Thread David Hopwood
g. >>The 'additional symbolic values' should be implemented as true >>encodings (i.e., it should be possible to look up 'site', 'guess' and >>'locale' in the codecs registry, and replace them there as well). > > Treating different thing

Re: [Python-3000] Help on text editors

2006-09-09 Thread David Hopwood
icode -> Shift-JIS -> Unicode; the issue is whether it is encoded as 0x5C, or something else like 0x815F. It may very well not round-trip if you use different implementations for encoding and decoding. -- David Hopwood <[EMAIL PROTECTED]> ___

Re: [Python-3000] Help on text editors

2006-09-09 Thread David Hopwood
<http://wakaba-web.hp.infoseek.co.jp/table/sjis-0208-1997-std.txt> although there is quite a bit of variation in mappings: <http://www.haible.de/bruno/charsets/conversion-tables/Shift_JIS.html> -- David Hopwood <[EMAIL PROTECTED]> ___

Re: [Python-3000] Help on text editors

2006-09-09 Thread David Hopwood
Michael Urman wrote: > On 9/7/06, David Hopwood <[EMAIL PROTECTED]> wrote: > >>Yes. However, this is not a good idea for precisely the reason described >>on that page (false detection of Unicode), and so any Unicode detection >>algorithm in Python should only be

Re: [Python-3000] Help on text editors

2006-09-07 Thread David Hopwood
changes. It uses BOMs to mark all unicode encodings, but doesn't > require them to be present in order to detect "Unicode." > http://blogs.msdn.com/michkap/archive/2006/06/14/631016.aspx Yes. However, this is not a good idea for precisely

Re: [Python-3000] Help on text editors

2006-09-07 Thread David Hopwood
David Hopwood wrote: > Paul Prescod wrote: > >>Guido has asked me to do some research in aid of a file encoding >>detection/defaulting PEP. >> >>I only have access to a small number of operating systems and language >>variants so I need help. >>

Re: [Python-3000] Help on text editors

2006-09-07 Thread David Hopwood
See <http://www.microsoft.com/globaldev/DrIntl/faqs/Locales.mspx>, <http://www.microsoft.com/globaldev/reference/WinCP.mspx>, and <http://www.microsoft.com/globaldev/reference/win2k/setup/localsupport.mspx>. Each "language group" maps to a similarly named "ANSI" code page (a

Re: [Python-3000] locale-aware strings ?

2006-09-06 Thread David Hopwood
Jim Jewett wrote: > On 9/4/06, David Hopwood <[EMAIL PROTECTED]> wrote: > >> The issue is not simplicity of implementation; it is what will provide >> the simplest usage model in the long term. If new files are encoded in X >> just because most of a user's ex

Re: [Python-3000] locale-aware strings ?

2006-09-06 Thread David Hopwood
Guido van Rossum wrote: > On 9/4/06, David Hopwood <[EMAIL PROTECTED]> wrote: >> Guido van Rossum wrote: >> >> > I've always said (can someone find a quote perhaps?) that there ought >> > to be a sensible default encoding for files (including but

Re: [Python-3000] locale-aware strings ?

2006-09-05 Thread David Hopwood
Paul Prescod wrote: > On 9/5/06, David Hopwood <[EMAIL PROTECTED]> wrote: >> Guido van Rossum wrote: >> > On 9/5/06, Brian Quinlan <[EMAIL PROTECTED]> wrote: >> > [...] >> > >> > That would not be doing what the user wants. We have extensi

Re: [Python-3000] locale-aware strings ?

2006-09-05 Thread David Hopwood
Guido van Rossum wrote: > On 9/5/06, David Hopwood <[EMAIL PROTECTED]> wrote: >> Guido van Rossum wrote: >> > On 9/5/06, Paul Prescod <[EMAIL PROTECTED]> wrote: >> > >> >> Beyond all of that: It just seems wrong to me that I could send >> &

Re: [Python-3000] locale-aware strings ?

2006-09-05 Thread David Hopwood
he system ("ANSI") encoding will be Cp1252-with-Euro (which is similar enough to ISO-8859-1 if C1 control characters are not used). -- David Hopwood <[EMAIL PROTECTED]> ___ Python-3000 mailing list Python-3000@python.org http://mai

Re: [Python-3000] locale-aware strings ?

2006-09-05 Thread David Hopwood
David Hopwood wrote: > I don't know about vi, but notepad will open and save files that are not in > the system ("ANSI") encoding just fine. On opening it checks for a BOM and > auto-detects UTF-8 and UTF-16; on saving it will write a BOM if you choose > "Unicode&

Re: [Python-3000] locale-aware strings ?

2006-09-05 Thread David Hopwood
ding and writing files in charsets that are not the system default. So in practice the locale has to be set to the "old" charset during a migration to UTF-8. (Setting different locales for different applications is far too much hassle. On Windows, although I believe it is technically possible to

Re: [Python-3000] locale-aware strings ?

2006-09-04 Thread David Hopwood
as a character count. For charsets like ISCII and ISO 2022, which are stateful and/or have a different encoding model to Unicode, I don't believe this approach would work very well. But it is fine to support this for some charsets and not others. -- David Hopwood <[EMAIL PROTECTED]&g

Re: [Python-3000] Comment on iostack library

2006-08-31 Thread David Hopwood
te sequence. note it's a *byte* sequence, not chars, > since this passes down to layer 1 transparently. That isn't what is required; for big-endian UCS-2 or UTF-16, "\x00\x0a" should only be recognized as LF if it is at an even byte position. -- David Hopwood <[EMAIL PRO

Re: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort()

2006-08-28 Thread David Hopwood
eclared as PyObject *.) The 'operation' string is sometimes a gerund ("slicing", etc.) and sometimes the name of a method. This should be more consistent. > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "repitition"); "repetition" -- David Hopwood &

Re: [Python-3000] Droping find/rfind?

2006-08-24 Thread David Hopwood
d = s.find("}", posarg) > except ValueError: > break try: posstart = s.index("{", pos) posarg = s.index(" ", posstart) posend = s.find("}", posarg) except ValueEr