Re: python 2.7 and unicode (one more time)

2014-12-02 Thread Simon Evans
Hi Peter Otten re: There is no assignment soup_atag = whatever but there is one to atag. The whole session should when you omit the offending line atag = soup_atag.a or insert soup_atag = soup before it.

Re: python 2.7 and unicode (one more time)

2014-11-25 Thread Steven D'Aprano
Marko Rauhamaa wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: Marko Rauhamaa wrote: Py3's byte strings are still strings, though. Hm. I don't think so. In a plain English sense, maybe, but that kind of usage can lead to confusion. Only if you are determined to confuse

Re: python 2.7 and unicode (one more time)

2014-11-25 Thread Chris Angelico
On Tue, Nov 25, 2014 at 10:56 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: I think this conversation is going nowhere, so it's probably best to end it. \0 ChrisA -- https://mail.python.org/mailman/listinfo/python-list

Re: python 2.7 and unicode (one more time)

2014-11-24 Thread Steven D'Aprano
Marko Rauhamaa wrote: Py3's byte strings are still strings, though. Hm. I don't think so. In a plain English sense, maybe, but that kind of usage can lead to confusion. Only if you are determined to confuse yourself. People are quite capable of interpreting correctly sentences like: My

Re: python 2.7 and unicode (one more time)

2014-11-24 Thread Chris Angelico
On Tue, Nov 25, 2014 at 9:56 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: In all cases apart from an explicit byte string, the word string is always used for the native array-of-characters type delimited by plain quotation marks, as used for error messages, user prompts,

Re: python 2.7 and unicode (one more time)

2014-11-24 Thread Marko Rauhamaa
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: Marko Rauhamaa wrote: Py3's byte strings are still strings, though. Hm. I don't think so. In a plain English sense, maybe, but that kind of usage can lead to confusion. Only if you are determined to confuse yourself. {...] In

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Chris Angelico
On Mon, Nov 24, 2014 at 3:33 AM, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: On Sat, 22 Nov 2014 20:52:37 -0500, random...@fastmail.us declaimed the following: On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote: ... That is a standard Windows build. He is again conflating problems with

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread random832
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote: Why would that be possible? Many truetype fonts only supply glyphs for single-byte encodings (ISO-Latin-1, for example -- pop up the Windows character map utility and see what some of the font files contain. With a bitmap font

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Dave Angel
On 11/23/2014 01:13 PM, random...@fastmail.us wrote: On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote: Why would that be possible? Many truetype fonts only supply glyphs for single-byte encodings (ISO-Latin-1, for example -- pop up the Windows character map utility and see what

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Chris Angelico
On Mon, Nov 24, 2014 at 7:31 AM, Dave Angel d...@davea.name wrote: On 11/23/2014 01:13 PM, random...@fastmail.us wrote: On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote: Why would that be possible? Many truetype fonts only supply glyphs for single-byte encodings (ISO-Latin-1,

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Gregory Ewing
Marko Rauhamaa wrote: Unicode strings is not wrong but the technical emphasis on Unicode is as strange as a tire car or rectangular door when car and door are what you usually mean. The reason Unicode gets emphasised so much is that until relatively recently, it *wasn't* what string usually

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Chris Angelico
On Mon, Nov 24, 2014 at 9:51 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Marko Rauhamaa wrote: Unicode strings is not wrong but the technical emphasis on Unicode is as strange as a tire car or rectangular door when car and door are what you usually mean. The reason Unicode gets

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread random832
On Sun, Nov 23, 2014, at 15:31, Dave Angel wrote: I didn't realize Windows shell (DOS box) had that bug. Course I don't use Windows much the last few years. it's one thing to not display it properly. It's quite another to supply faulty data to the clipboard. Especially since the Windows

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Marko Rauhamaa
Gregory Ewing greg.ew...@canterbury.ac.nz: Marko Rauhamaa wrote: Unicode strings is not wrong but the technical emphasis on Unicode is as strange as a tire car or rectangular door when car and door are what you usually mean. The reason Unicode gets emphasised so much is that until relatively

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Chris Angelico
On Mon, Nov 24, 2014 at 5:57 PM, Marko Rauhamaa ma...@pacujo.net wrote: Yes, people call strings Unicdoe strings because Python2 *did have* unicode strings separate from regular strings: Python2Python3 -- string bytes

Re: python 2.7 and unicode (one more time)

2014-11-23 Thread Marko Rauhamaa
Chris Angelico ros...@gmail.com: Py3's byte strings are still strings, though. Hm. I don't think so. In a plain English sense, maybe, but that kind of usage can lead to confusion. For example, A subscription selects an item of a sequence (string, tuple or list) or mapping (dictionary)

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Steven D'Aprano
Marko Rauhamaa wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: In Python, we have Unicode strings and byte strings. No, you don't. You have strings and bytes: Python has strings of Unicode code points, a.k.a. Unicode strings, or text strings, and strings of bytes, a.k.a. byte

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Chris Angelico
On Sun, Nov 23, 2014 at 12:50 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Tire car makes no sense. Rectangular door makes perfect sense, and in a world where there are dozens of legacy non-rectangular doors, it would be very sensible to specify the kind of door. Just as we

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Marko Rauhamaa
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: You haven't given any good reason for objecting to calling Unicode strings by what they are. Maybe you think that it is an implementation detail, and that some version of Python might suddenly and without warning change to only supporting

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Roy Smith
In article 87y4r348uf@elektro.pacujo.net, Marko Rauhamaa ma...@pacujo.net wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: You haven't given any good reason for objecting to calling Unicode strings by what they are. Maybe you think that it is an implementation detail,

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Marko Rauhamaa
Roy Smith r...@panix.com: For that matter, we will eventually get to the point where when people say, just plain text, they will mean Unicode, in the same way that just plain text today really means ASCII (and the text/plain MIME type will become a historical curiosity). MIME has:

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Rustom Mody
On Saturday, November 22, 2014 8:14:15 PM UTC+5:30, Roy Smith wrote: Marko Rauhamaa wrote: Steven D'Aprano: You haven't given any good reason for objecting to calling Unicode strings by what they are. Maybe you think that it is an implementation detail, and that some version of

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Marko Rauhamaa
wxjmfa...@gmail.com: - By chance, I found on the web a German py dev who was commenting and he had not an updated DUDEN (a German dictionnary). That... leaves me utterly speachless! Marko -- https://mail.python.org/mailman/listinfo/python-list

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Mark Lawrence
On 22/11/2014 17:49, Marko Rauhamaa wrote: wxjmfa...@gmail.com: - By chance, I found on the web a German py dev who was commenting and he had not an updated DUDEN (a German dictionnary). That... leaves me utterly speachless! Marko Please don't feed him. Your average troll is bad enough

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Chris Angelico
On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: Please don't feed him. Your average troll is bad enough but he really takes the biscuit. ... someone was feeding him biscuits? ChrisA -- https://mail.python.org/mailman/listinfo/python-list

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Mark Lawrence
On 22/11/2014 20:17, Chris Angelico wrote: On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: Please don't feed him. Your average troll is bad enough but he really takes the biscuit. ... someone was feeding him biscuits? ChrisA Surely it's better than feeding

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Chris Angelico
On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: My favourite find thousand and one ways to make Python crashing or failing. but I don't recall a single bug report in the last two years from anybody regarding problems with the FSR, or have I missed something? What

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Mark Lawrence
On 22/11/2014 22:31, Chris Angelico wrote: On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: My favourite find thousand and one ways to make Python crashing or failing. but I don't recall a single bug report in the last two years from anybody regarding problems with

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread random832
On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote: I really don't understand what bothers you about this. In Python, we have Unicode strings and byte strings. In computing in general, strings can consist of Unicode characters, ASCII characters, Tron characters, EBCDID characters,

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread random832
On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote: ... That is a standard Windows build. He is again conflating problems with using the Windows command line for a given code page with the FSR. The thing is, with a truetype font selected, a correctly written win32 console problem should be

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Chris Angelico
On Sun, Nov 23, 2014 at 12:52 PM, random...@fastmail.us wrote: On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote: ... That is a standard Windows build. He is again conflating problems with using the Windows command line for a given code page with the FSR. The thing is, with a truetype

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread random832
On Sat, Nov 22, 2014, at 21:11, Chris Angelico wrote: Is that true? Does WriteConsoleW support every Unicode character? It's not obvious from the docs whether it uses UCS-2 or UTF-16 (or maybe something else). I was defining every unicode character loosely. There are certainly display problems

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Steven D'Aprano
random...@fastmail.us wrote: On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote: I really don't understand what bothers you about this. In Python, we have Unicode strings and byte strings. In computing in general, strings can consist of Unicode characters, ASCII characters, Tron

Re: python 2.7 and unicode (one more time)

2014-11-22 Thread Chris Angelico
On Sun, Nov 23, 2014 at 5:17 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: If Python treated the character set as an implementation detail, the programmer would have no way of knowing whether s = uö is legal or not, since you cannot know whether or not ö is a supported

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Marko Rauhamaa
Chris Angelico ros...@gmail.com: Then you need to read more about Unicode. The *codepoint* for the letter 'A' is 65. That is not Unicode, that is one part of the Unicode spec. I don't think Python users need to know anything more about Unicode than they need to know about IEEE-754. How many

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Chris Angelico
On Fri, Nov 21, 2014 at 7:16 PM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: Then you need to read more about Unicode. The *codepoint* for the letter 'A' is 65. That is not Unicode, that is one part of the Unicode spec. I don't think Python users need to know

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Steven D'Aprano
Chris Angelico wrote: On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: (E.g. there are millions of existing files across the world containing text which use legacy encodings that are not compatible with Unicode.) Not compatible with Unicode?

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Chris Angelico
On Sat, Nov 22, 2014 at 2:23 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Chris Angelico wrote: On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: (E.g. there are millions of existing files across the world containing text which

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Tim Chase
On 2014-11-22 02:23, Steven D'Aprano wrote: LATIN SMALL LETTER E COMBINING CIRCUMFLEX ACCENT then my application should treat that as a single character and display it as: LATIN SMALL LETTER E WITH CIRCUMFLEX which looks like this: ê rather than two distinct characters eˆ Now,

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Rustom Mody
On Friday, November 21, 2014 12:06:54 PM UTC+5:30, Marko Rauhamaa wrote: Chris Angelico : On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote: I don't really like it how Unicode is equated with text, or even character strings. [...] Do you have actual text that you're unable to

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Francis Moreau
On 11/20/2014 04:15 PM, Chris Angelico wrote: On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau francis.m...@gmail.com wrote: Hi, Thanks for the from __future__ import unicode_literals trick, it makes that switch much less intrusive. However it seems that I will suddenly be trapped by all

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Chris Angelico
On Sat, Nov 22, 2014 at 3:11 AM, Francis Moreau francis.m...@gmail.com wrote: Yes I finally used str() since only setlocale() reported to have some issues with unicode_literals active in my appliction. Thanks Chris for your useful insight. My pleasure. Unicode is a bit of a hobby-horse of

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Marko Rauhamaa
Rustom Mody rustompm...@gmail.com: Likewise in 2014, and given the arguments, inconsistencies, etc remembering the nuts-n-bolts below the strings-represented-as-unicode abstraction may be in order. No need to hide Unicode, but talking about a Unicode string is like talking about an

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Chris Angelico
On Sat, Nov 22, 2014 at 3:36 AM, Marko Rauhamaa ma...@pacujo.net wrote: No need to hide Unicode, but talking about a Unicode string is like talking about an electronic computer visible spectrum display mouse user interface ethernet socket magnetic file

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Steven D'Aprano
Marko Rauhamaa wrote: Rustom Mody rustompm...@gmail.com: Likewise in 2014, and given the arguments, inconsistencies, etc remembering the nuts-n-bolts below the strings-represented-as-unicode abstraction may be in order. No need to hide Unicode, but talking about a Unicode string

Re: python 2.7 and unicode (one more time)

2014-11-21 Thread Marko Rauhamaa
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: In Python, we have Unicode strings and byte strings. No, you don't. You have strings and bytes: Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points. String literals

python 2.7 and unicode (one more time)

2014-11-20 Thread Francis Moreau
Hello, My application is using gettext module to do the translation stuff. Translated messages are unicode on both python 2 and 3 (with python2.7 I had to explicitely asked for unicode). A problem arises when formatting those messages before logging them. For example: log.debug(%s: %s %

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau francis.m...@gmail.com wrote: My question is: how should this be fixed properly ? A simple solution would be to force all strings passed to the logger to be unicode: log.debug(u%s: %s % ...) and more generally force all string in my code to

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Peter Otten
Francis Moreau wrote: Hello, My application is using gettext module to do the translation stuff. Translated messages are unicode on both python 2 and 3 (with python2.7 I had to explicitely asked for unicode). A problem arises when formatting those messages before logging them. For

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Thu, Nov 20, 2014 at 11:35 PM, Peter Otten __pete...@web.de wrote: You don't need to change an all-ascii bytestring to unicode. Lo and behold: %s %s % (uüblich, uähnlich) u'\xfcblich \xe4hnlich' u%s %s % (uüblich, uähnlich) u'\xfcblich \xe4hnlich' Only non-ascii bytestrings mean

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread random832
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote: %s nötig %s % (uüblich, uähnlich) Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) This is surprising to me - why is it

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Francis Moreau
Hi, On 11/20/2014 11:47 AM, Chris Angelico wrote: On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau francis.m...@gmail.com wrote: My question is: how should this be fixed properly ? A simple solution would be to force all strings passed to the logger to be unicode: log.debug(u%s: %s %

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote: %s nötig %s % (uüblich, uähnlich) Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau francis.m...@gmail.com wrote: Hi, Thanks for the from __future__ import unicode_literals trick, it makes that switch much less intrusive. However it seems that I will suddenly be trapped by all modules which are not prepared to handle unicode.

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Peter Otten
random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote: %s nötig %s % (uüblich, uähnlich) Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) This is

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more confusing... Hang on a minute, what does it even mean to decode a Unicode string? That's where the problem is. Fortunately

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Peter Otten
Chris Angelico wrote: On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more confusing... Hang on a minute, what does it even mean to decode a Unicode string? Let's not get

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten __pete...@web.de wrote: Chris Angelico wrote: On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more confusing... Hang on a minute,

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Michael Torrie
On 11/20/2014 09:32 AM, Peter Otten wrote: Chris Angelico wrote: On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more confusing... Hang on a minute, what does it even mean to

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Peter Otten
Chris Angelico wrote: On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten __pete...@web.de wrote: Chris Angelico wrote: On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread random832
On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote: On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote: %s nötig %s % (uüblich, uähnlich) Traceback (most recent call last): File stdin, line 1, in module

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Ian Kelly
On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote: and it means you can't safely blindly use %s with an unknown object. You can't safely do this anyway. Whether it's %s with a str and a unicode, or %s with a unicode and a str, *something* is going to have to be implicitly encoded

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Ian Kelly
On Thu, Nov 20, 2014 at 11:06 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote: and it means you can't safely blindly use %s with an unknown object. You can't safely do this anyway. Whether it's %s with a str and a unicode, or %s with

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Peter Otten
random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote: On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote: %s nötig %s % (uüblich, uähnlich) Traceback (most recent call last): File stdin, line

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Marko Rauhamaa
Michael Torrie torr...@gmail.com: Unicode can only be encoded to bytes. Bytes can only be decoded to unicode. I don't really like it how Unicode is equated with text, or even character strings. There's barely any difference between the truth value of these statements: Python strings are

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Mark Lawrence
On 20/11/2014 18:06, Ian Kelly wrote: On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote: and it means you can't safely blindly use %s with an unknown object. You can't safely do this anyway. Whether it's %s with a str and a unicode, or %s with a unicode and a str, *something* is

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Ethan Furman
On 11/20/2014 07:53 AM, Chris Angelico wrote: On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote: I think that you may get a Unicode/Encode/Error when you try to /decode/ a unicode string is more confusing... Hang on a minute, what does it even mean to decode a Unicode

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Marko Rauhamaa
Ethan Furman et...@stoneleaf.us: If your unicode string happens to contain a base64 encoded .png, then you could decode that into bytes. ;) You could embed your PNG file in XML in binary form as CDATA. Then, your characters would represent 8- or 16-bit integers. You just need to replace all

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread random832
On Thu, Nov 20, 2014, at 16:29, Ethan Furman wrote: If your unicode string happens to contain a base64 encoded .png, then you could decode that into bytes. ;) Bytes of the PNG, or of the raw pixels? -- https://mail.python.org/mailman/listinfo/python-list

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 4:42 AM, random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote: Why should it encode to bytes? Because a bytes format string suggests a bytes result. Why does unicode always win, rather than the type of the format string always winning?

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote: Michael Torrie torr...@gmail.com: Unicode can only be encoded to bytes. Bytes can only be decoded to unicode. I don't really like it how Unicode is equated with text, or even character strings. There's barely any

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Steven D'Aprano
Marko Rauhamaa wrote: Michael Torrie torr...@gmail.com: Unicode can only be encoded to bytes. Bytes can only be decoded to unicode. I don't really like it how Unicode is equated with text, or even character strings. That surely depends on the context. To be technically correct, Unicode

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: (E.g. there are millions of existing files across the world containing text which use legacy encodings that are not compatible with Unicode.) Not compatible with Unicode? There aren't many character

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread random832
On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote: 2) Languages which use a different alphabet (eg Cyrillic - Russian, Bulgarian). You could possibly cram them into an eight-bit encoding without tipping ASCII out, but I'm not sure. In Unicode, these languages are all easily supported by the

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 12:31 PM, random...@fastmail.us wrote: On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote: 2) Languages which use a different alphabet (eg Cyrillic - Russian, Bulgarian). You could possibly cram them into an eight-bit encoding without tipping ASCII out, but I'm not

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Marko Rauhamaa
Chris Angelico ros...@gmail.com: On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote: I don't really like it how Unicode is equated with text, or even character strings. [...] Do you have actual text that you're unable to represent in Unicode? Not my point at all. I'm

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote: I don't really like it how Unicode is equated with text, or even character strings. [...] Do you have actual text

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Marko Rauhamaa
Chris Angelico ros...@gmail.com: On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote: I'm saying equating an abstract data type (string) with its representation (Unicode vector) is bad taste. What about sequence of Unicode code points is representation? What is your

Re: python 2.7 and unicode (one more time)

2014-11-20 Thread Chris Angelico
On Fri, Nov 21, 2014 at 6:14 PM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote: I'm saying equating an abstract data type (string) with its representation (Unicode vector) is bad taste. What