-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
M.-A. Lemburg wrote:
> Antoine Pitrou wrote:
>> Le samedi 23 janvier 2010 à 20:43 +0100, M.-A. Lemburg a écrit :
>>> Now, we cannot easily remove this guessing since we're in stable
>>> mode again with 3.1. Perhaps we should add a way to at least be
>>
Antoine Pitrou wrote:
> Le samedi 23 janvier 2010 à 20:43 +0100, M.-A. Lemburg a écrit :
>>
>> Now, we cannot easily remove this guessing since we're in stable
>> mode again with 3.1. Perhaps we should add a way to at least be
>> able to switch off this guessing, so that applications can be
>> test
> Using any guessing based on the locale (which describes the codec used
> byt the user's console, but is completely uncorrelated to any particular
> file on the user's filesystem)
No, it's not just the encoding of the console. It is also the encoding
that text editors will use, in absence of a mo
"Martin v. Löwis" writes:
> My bet is that the majority of Python applications written today do
> "web" stuff. In the web, input encoding and output encoding are
> fairly decorrelated - in particular for databases and files read
> from disk.
Sure. Which means that programmers have to do a lo
Antoine Pitrou writes:
> Stephen J. Turnbull xemacs.org> writes:
> >
> > But it *does* determine the charset of ErrorDocuments displayed by
> > Apache. Users are likely to get somewhat confused if the
> > ErrorDocuments are in a different charset from your dynamic HTML.
>
> Why would the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Stephen J. Turnbull wrote:
> You just can't get away from the need for explicit management of
> codecs if you want a robust internationalized application. I don't
> object to giving users an easy way to get the behavior Michael
> proposes; it just sh
On Sun, Jan 24, 2010 at 1:54 PM, Oleg Broytman wrote:
..
> Depends on the kind of cat and especially on the ways of using it. If
> you ask cat to number lines (see manual for GNU cat) - what do "lines" mean
> for binary IO?
Maybe this is yet another reason why some kinds of cat are a bad idea:
Oleg Broytman phd.pp.ru> writes:
>
>Depends on the kind of cat and especially on the ways of using it. If
> you ask cat to number lines (see manual for GNU cat) - what do "lines" mean
> for binary IO?
b"\n"-separated chunks of data. See the docs:
http://docs.python.org/3.1/library/io.html#io
> I concede that I have no better statistics on the matter than you do,
> but I think that's wishful thinking. It is quite common for "pure
> output" to be mixed with "echoed input", for example. Even if a file
> is converted to another format (eg, restructured text to LaTeX), it's
> very common
On Sun, Jan 24, 2010 at 07:45:20PM +0100, "Martin v. L?wis" wrote:
> This may be a bit out of context - however, a simple cat program should
> open files in binary, and be done.
>
> (not sure whether the average naive programmer is able to grasp the
> notion of binary IO and to oppose to text IO,
On 24/01/2010 18:41, "Martin v. Löwis" wrote:
However it is likely to be often wrong, and where the user's locale
specifies an encoding like CP1252 then it will result in silent
corruption rather than an immediate exception.
Why do you say that? Why do you think it will likely be often wro
> So what is your naive programmer supposed to expect
> when writing a cat program?
This may be a bit out of context - however, a simple cat program should
open files in binary, and be done.
(not sure whether the average naive programmer is able to grasp the
notion of binary IO and to oppose to t
> However it is likely to be often wrong, and where the user's locale
> specifies an encoding like CP1252 then it will result in silent
> corruption rather than an immediate exception.
Why do you say that? Why do you think it will likely be often wrong?
Most likely, encoding text files with cp1252
Stephen J. Turnbull xemacs.org> writes:
>
> But it *does* determine the charset of ErrorDocuments displayed by
> Apache. Users are likely to get somewhat confused if the
> ErrorDocuments are in a different charset from your dynamic HTML.
Why would they? The browser picks the encoding from eithe
Antoine Pitrou writes:
> Perhaps you are speaking with your emacs hat, where the purpose is
> to output to the same file that serves as input.
No, I'm not wearing my Emacs hat. If I was, there would be no
problem. You just use binary for most such purposes. Historically
that was how even Ema
Stephen J. Turnbull xemacs.org> writes:
>
> That's throwing the baby out with the bathwater. Very few practical
> applications that care about the input encoding are going to be
> willing to accept an output encoding that doesn't correspond to the
> input encoding in an appropriate way.
Perhaps
Michael Foord writes:
> When reading text files the presence of the UTF-8 signature *almost
> invariably* means a UTF-8 encoding. Honouring this will almost always be
> better than using the wrong encoding. Of course there are caveats, but
> it will be a substantial improvement.
Sure, that
On 24/01/2010 14:23, Stephen J. Turnbull wrote:
Michael Foord writes:
> This is why I'm keen that by *default* Python should honour the UTF8
> signature when reading files;
Unfortunately, your caveat about "a lot of the time it will *seem* to
work" applies to this as well. The only way t
Michael Foord writes:
> This is why I'm keen that by *default* Python should honour the UTF8
> signature when reading files;
Unfortunately, your caveat about "a lot of the time it will *seem* to
work" applies to this as well. The only way that "honoring
signatures" really works is if Python
On 23 Jan 2010, at 07:53, "Martin v. Löwis" wrote:
[snip...]
Yes, definitely. It is this very reasoning that caused Python 2.x to
use ASCII as the default encoding (when mixing strings and unicode),
and, for the entire lifetime of 2.x, has caused endless pain for
developers, which simply fai
Le samedi 23 janvier 2010 à 20:43 +0100, M.-A. Lemburg a écrit :
>
> Now, we cannot easily remove this guessing since we're in stable
> mode again with 3.1. Perhaps we should add a way to at least be
> able to switch off this guessing, so that applications can be
> tested in a predictable way, rat
Terry Reedy udel.edu> writes:
>
> Dreadfully rong. In general, the default encoding on my Windows *does
> not work* on Python3 strings but causes
> UnicodeEncodeError:
> If the text is not written to the file, it is completely non-portable.
You must mistake "portable" for something else. "P
On 1/23/2010 7:53 AM, Antoine Pitrou wrote:
Terry Reedy udel.edu> writes:
If the current guess is based on a mistaken assumption -- that it is
giving the user what the user asked for -- it might be reconsidered. I
personally would prefer that the default file encoding for Python 3 be
utf-8 on
Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> "Martin v. Löwis" wrote:
>>> Hmm - what do you mean by "normally"? Normally, text files are meant
>>> for human readers, not for exchange between programs.
>>
>> It's rather common to exchange text files between users... and
>> in form of XML and CSV fi
Antoine Pitrou wrote:
> M.-A. Lemburg egenix.com> writes:
>>
>> It's rather common to exchange text files between users... and
>> in form of XML and CSV files, these also tend to get used as
>> input for programs every now and then
>
> For XML files, encoding should be taken care of by the XML l
M.-A. Lemburg egenix.com> writes:
>
> It's rather common to exchange text files between users... and
> in form of XML and CSV files, these also tend to get used as
> input for programs every now and then
For XML files, encoding should be taken care of by the XML layer, not the IO
layer. That is
M.-A. Lemburg wrote:
> "Martin v. Löwis" wrote:
>> Hmm - what do you mean by "normally"? Normally, text files are meant
>> for human readers, not for exchange between programs.
>
> It's rather common to exchange text files between users... and
> in form of XML and CSV files, these also tend to get
"Martin v. Löwis" wrote:
>> No, but it's most likely a wrong guess, since text files don't
>> really have anything to do with what the user wants to see in
>> a user interface.
>
> That also depends on the operating system. On Windows, there is
> a long tradition of encoding all text in the system
> No, but it's most likely a wrong guess, since text files don't
> really have anything to do with what the user wants to see in
> a user interface.
That also depends on the operating system. On Windows, there is
a long tradition of encoding all text in the system code page.
All text editors on Wi
"Martin v. Löwis" wrote:
>> This all begs the question: why is there a default? and why is the
>> default a guess?
>>
>> I have to admit that I was completely oblivious to this potential
>> pitfall, and mostly that's because in the most common case, I am working
>> with ASCII files.
>
> You answer
Terry Reedy udel.edu> writes:
>
> If the current guess is based on a mistaken assumption -- that it is
> giving the user what the user asked for -- it might be reconsidered. I
> personally would prefer that the default file encoding for Python 3 be
> utf-8 on any machine my code runs on unless
>> So for the limited case of text IO, Python 3.x now makes a guess.
>> However, this guess is not in the face of ambiguity: it is the
>> locale that the user (or his administrator) has selected,
>
> That is a mistaken assumption for many. I have never, that I know of,
> selected a locale on any o
On 1/23/2010 2:53 AM, "Martin v. Löwis" wrote:
So for the limited case of text IO, Python 3.x now makes a guess.
However, this guess is not in the face of ambiguity: it is the
locale that the user (or his administrator) has selected,
That is a mistaken assumption for many. I have never, that I
> This all begs the question: why is there a default? and why is the
> default a guess?
>
> I have to admit that I was completely oblivious to this potential
> pitfall, and mostly that's because in the most common case, I am working
> with ASCII files.
You answered your own question: it is this r
On 1/22/2010 2:04 PM, M.-A. Lemburg wrote:
> Karen Tracey wrote:
>> On Fri, Jan 22, 2010 at 7:38 AM, Michael Foord
>> wrote:
>>> The encoding I'm talking about is the
>>> encoding that Python uses to decode a file (or encode a string) when you do
>>> the following in Python 3:
>>>
>>>text = op
> Ok, I'm just using the wrong terminology. I'm aware that mbcs is used
> for filename encoding on Windows (right?).
Not anymore, no.
> The encoding I'm talking
> about is the encoding that Python uses to decode a file (or encode a
> string) when you do the following in Python 3:
>
> text =
Karen Tracey wrote:
> On Fri, Jan 22, 2010 at 7:38 AM, Michael Foord
> wrote:
>
>> On 21/01/2010 21:21, "Martin v. Löwis" wrote:
>>
>>> Where the default *file system encoding* is used (i.e. text files are
written or read without specifying an encoding)
>>> I think you misunderstan
On 22/01/2010 14:33, Antoine Pitrou wrote:
Michael Foord voidspace.org.uk> writes:
Heh, so we have two different encoding mechanisms both called "default
encoding". One is always utf-8 in Python 3 and one is platform
dependent... Great.
The former is merely internal though. Also, if
Michael Foord voidspace.org.uk> writes:
>
> Heh, so we have two different encoding mechanisms both called "default
> encoding". One is always utf-8 in Python 3 and one is platform
> dependent... Great.
The former is merely internal though. Also, if you grep for the "s#" and "s*"
argument type c
On Fri, Jan 22, 2010 at 9:22 AM, Michael Foord wrote:
> On 22/01/2010 14:18, Karen Tracey wrote:
>
>
> The doc here:
> http://docs.python.org/3.1/library/functions.html?highlight=open#open just
> calls it default encoding and clarifies that is "whatever
> locale.getpreferredencoding() returns".
>
On 22/01/2010 14:18, Karen Tracey wrote:
On Fri, Jan 22, 2010 at 7:38 AM, Michael Foord
mailto:fuzzy...@voidspace.org.uk>> wrote:
On 21/01/2010 21:21, "Martin v. Löwis" wrote:
Where the default *file system encoding* is used (i.e.
text files are
written
On Fri, Jan 22, 2010 at 7:38 AM, Michael Foord wrote:
> On 21/01/2010 21:21, "Martin v. Löwis" wrote:
>
>> Where the default *file system encoding* is used (i.e. text files are
>>> written or read without specifying an encoding)
>>>
>>>
>> I think you misunderstand the notion of the *file system e
On 21/01/2010 21:21, "Martin v. Löwis" wrote:
Where the default *file system encoding* is used (i.e. text files are
written or read without specifying an encoding)
I think you misunderstand the notion of the *file system encoding*.
It is *not* a "file encoding", but the file *system* encod
On Thu, 2010-01-21 at 22:21 +0100, "Martin v. Löwis" wrote:
> > Where the default *file system encoding* is used (i.e. text files are
> > written or read without specifying an encoding)
>
> I think you misunderstand the notion of the *file system encoding*.
> It is *not* a "file encoding", but the
> Where the default *file system encoding* is used (i.e. text files are
> written or read without specifying an encoding)
I think you misunderstand the notion of the *file system encoding*.
It is *not* a "file encoding", but the file *system* encoding, i.e.
the encoding for file *names*, not for f
On Thu, 2010-01-21 at 00:06 +0100, "Martin v. Löwis" wrote:
> > Why only set an encoding on these streams when they're directly
> > connected to a tty?
>
> If you are sending data to the terminal, you can be fairly certain
> that the locale's encoding should be used. It's a convenience feature
> f
Michael Foord wrote:
As always: It's better not to rely on such defaults and explicitly
provide the encoding as parameter where possible.
>>> Sure. I do worry that developers will still rely on the default behavior
>>> assuming that Python 3 "fixes their encoding pr
On 21/01/2010 12:00, M.-A. Lemburg wrote:
Michael Foord wrote:
On 21/01/2010 11:15, M.-A. Lemburg wrote:
Michael Foord wrote:
On 20/01/2010 21:37, M.-A. Lemburg wrote:
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-
Michael Foord wrote:
> On 21/01/2010 11:15, M.-A. Lemburg wrote:
>> Michael Foord wrote:
>>
>>> On 20/01/2010 21:37, M.-A. Lemburg wrote:
>>>
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
>>> Is this true?
On 21/01/2010 11:15, M.-A. Lemburg wrote:
Michael Foord wrote:
On 20/01/2010 21:37, M.-A. Lemburg wrote:
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
Is this true? I thought the default encoding in Python 3 was platform
speci
On 20/01/2010 23:46, MRAB wrote:
Martin v. Löwis wrote:
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
Is this true?
For 3.x: yes. However, the default encoding is much less relevant in
3.x, since Python will never implicitly use the default encoding,
Michael Foord wrote:
> On 20/01/2010 21:37, M.-A. Lemburg wrote:
>> The only supported default encodings in Python are:
>>
>> Python 2.x: ASCII
>> Python 3.x: UTF-8
>>
>
> Is this true? I thought the default encoding in Python 3 was platform
> specific (i.e. cp1252 on Windows). That means
Martin v. Löwis wrote:
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
Is this true?
For 3.x: yes. However, the default encoding is much less relevant in
3.x, since Python will never implicitly use the default encoding, except
when some C module
>> The only supported default encodings in Python are:
>>
>> Python 2.x: ASCII
>> Python 3.x: UTF-8
>>
>
> Is this true?
For 3.x: yes. However, the default encoding is much less relevant in
3.x, since Python will never implicitly use the default encoding, except
when some C module asks fo
> Why only set an encoding on these streams when they're directly
> connected to a tty?
If you are sending data to the terminal, you can be fairly certain
that the locale's encoding should be used. It's a convenience feature
for the interactive mode, so that Unicode strings print correctly.
When
On 20/01/2010 21:37, M.-A. Lemburg wrote:
David Malcolm wrote:
I'm thinking of making this downstream change to Fedora's site.py (and
possibly in future RHEL releases) so that the default encoding
automatically picks up the encoding from the locale:
def setencoding():
"""Set the str
David Malcolm wrote:
> On Wed, 2010-01-20 at 22:37 +0100, M.-A. Lemburg wrote:
> Note that pango isn't even doing the module reload hack; it's written in
> C, and going in directly through the C API:
>PyUnicode_SetDefaultEncoding("utf-8");
>
> I should mention that I've seen at least one C mod
On Wed, 2010-01-20 at 22:37 +0100, M.-A. Lemburg wrote:
> David Malcolm wrote:
> > I'm thinking of making this downstream change to Fedora's site.py (and
> > possibly in future RHEL releases) so that the default encoding
> > automatically picks up the encoding from the locale:
> >
> > def setenco
> Hope this is helpful; can anyone see any potential problems with this
> change?
As Marc-Andre says: such a change is unsupported, and *will* break Python.
It's not true that the only supported encoding in 2.x is 'ascii',
'iso-8859-1' is also supported. 'utf-8' is not, neither is anything
else.
David Malcolm wrote:
> I'm thinking of making this downstream change to Fedora's site.py (and
> possibly in future RHEL releases) so that the default encoding
> automatically picks up the encoding from the locale:
>
> def setencoding():
> """Set the string encoding used by the Unicode implem
David Malcolm wrote:
> I've written up extensive notes on the change and the history of the
> issue here:
> https://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale
>
> Please let me know if there are any errors on that page!
That discussion appears incomplete without any mention of
I'm thinking of making this downstream change to Fedora's site.py (and
possibly in future RHEL releases) so that the default encoding
automatically picks up the encoding from the locale:
def setencoding():
"""Set the string encoding used by the Unicode implementation. The
default is 'a
62 matches
Mail list logo