Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread Neil Cerutti
On 2013-12-04, wxjmfa...@gmail.com wrote: > Yon intuitively pointed a very important feature of "unicode". > However, it is not necessary, this is exactly what unicode does > (when used properly). Unicode only provides character sets. It's not a natural language parsing facility. -- Neil Cerutt

Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread Mark Lawrence
On 04/12/2013 13:52, wxjmfa...@gmail.com wrote: [snip all the double spaced stuff] Yon intuitively pointed a very important feature of "unicode". However, it is not necessary, this is exactly what unicode does (when used properly). jmf Presumably using unicode correctly prevents messages b

Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread wxjmfauth
Le mardi 3 décembre 2013 15:26:45 UTC+1, Ethan Furman a écrit : > On 12/02/2013 12:38 PM, Ethan Furman wrote: > > > On 11/29/2013 04:44 PM, Steven D'Aprano wrote: > > >> > > >> Out of the nine tests, Python 3.3 passes six, with three tests being > > >> failures or dubious. If you believe that t

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread wxjmfauth
Le mardi 3 décembre 2013 06:06:26 UTC+1, Steven D'Aprano a écrit : > On Mon, 02 Dec 2013 16:14:13 -0500, Ned Batchelder wrote: > > > > > On 12/2/13 3:38 PM, Ethan Furman wrote: > > >> On 11/29/2013 04:44 PM, Steven D'Aprano wrote: > > >>> > > >>> Out of the nine tests, Python 3.3 passes six,

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread Ethan Furman
On 12/02/2013 12:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the r

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread Neil Cerutti
On 2013-12-02, Ethan Furman wrote: > On 11/29/2013 04:44 PM, Steven D'Aprano wrote: >> Out of the nine tests, Python 3.3 passes six, with three tests >> being failures or dubious. If you believe that the native >> string type should operate on code-points, then you'll think >> that Python does the

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-03 Thread Mark Lawrence
On 03/12/2013 01:38, Roy Smith wrote: In article , Mark Lawrence wrote: My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Py

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-03 Thread Mark Lawrence
On 03/12/2013 04:32, Grant Edwards wrote: On 2013-12-03, Roy Smith wrote: "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Python 3 the default version and having everybody be cool with unicode." I'm cool with Unicode as long as

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread joe
How would a grapheme library work? Basic cluster combination, or would implementing other algorithms (line break, normalizing to a "canonical" form) be necessary? How do people use grapheme clusters in non-rendering situations? Or here's perhaps here's a better question: does anyone know any non-l

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Steven D'Aprano
On Tue, 03 Dec 2013 04:32:13 +, Grant Edwards wrote: > On 2013-12-03, Roy Smith wrote: > >> "I believe that Pythonistas should commit themselves to achieving the >> goal, before this decade is out, of making Python 3 the default version >> and having everybody be cool with unicode." > > I'm

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Steven D'Aprano
On Mon, 02 Dec 2013 16:14:13 -0500, Ned Batchelder wrote: > On 12/2/13 3:38 PM, Ethan Furman wrote: >> On 11/29/2013 04:44 PM, Steven D'Aprano wrote: >>> >>> Out of the nine tests, Python 3.3 passes six, with three tests being >>> failures or dubious. If you believe that the native string type sho

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ethan Furman
On 12/02/2013 07:22 PM, Terry Reedy wrote: On 12/2/2013 4:25 PM, Ethan Furman wrote: jmf is certainly a troll No, he is a person who discovered a minor performance regression in the FSR, which we fixed. Unfortunately, he then continued for a year with a strange troll-like anti-FSR crusade. Bu

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Grant Edwards
On 2013-12-03, Roy Smith wrote: > "I believe that Pythonistas should commit themselves to achieving the > goal, before this decade is out, of making Python 3 the default version > and having everybody be cool with unicode." I'm cool with Unicode as long as it "just works" without me ever havin

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Terry Reedy
On 12/2/2013 4:25 PM, Ethan Furman wrote: jmf is certainly a troll No, he is a person who discovered a minor performance regression in the FSR, which we fixed. Unfortunately, he then continued for a year with a strange troll-like anti-FSR crusade. But his posts in the Unicode handling thread

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Roy Smith
In article , Mark Lawrence wrote: > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Python 3 the default version and havin

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 12/02/2013 02:32 PM, Mark Lawrence wrote: ... the other being a pot smoking hippy who ... Please trim your posts. You comment a lot on people sending double-spaced google posts -- not trimming is nearly as bad. The above is a good example of unnecessary name calling. I value your good p

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ben Finney
Ned Batchelder writes: > This is where my knowledge about Unicode gets fuzzy. Isn't it the > case that some grapheme clusters (or whatever the right word is) can't > be normalized down to a single code point? Characters can accept many > accents, for example. That's true, but doesn't affect th

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 5:32 PM, Mark Lawrence wrote: On 02/12/2013 22:24, Ned Batchelder wrote: On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I c

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 22:24, Ned Batchelder wrote: On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal att

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of t

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ned Batchelder
On 12/2/13 4:25 PM, Ethan Furman wrote: On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Mark Lawrence
On 02/12/2013 21:25, Ethan Furman wrote: On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation o

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 12/02/2013 01:23 PM, Chris Angelico wrote: On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote: This is where my knowledge about Unicode gets fuzzy. Isn't it the case that some grapheme clusters (or whatever the right word is) can't be normalized down to a single code point? Characters ca

Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ethan Furman
On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* ap

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread MRAB
On 02/12/2013 21:14, Ned Batchelder wrote: On 12/2/13 3:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points,

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Chris Angelico
On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote: > This is where my knowledge about Unicode gets fuzzy. Isn't it the case that > some grapheme clusters (or whatever the right word is) can't be normalized > down to a single code point? Characters can accept many accents, for > example. You

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 3:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the right

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the right thing. I think Python is doing it corr

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply to python-list. Please stop. The attack

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Terry Reedy
On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply to python-list. Please stop. -- Terry Jan Reedy, one of multiple list moderator

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 10:45 AM, Mark Lawrence wrote: On 02/12/2013 15:22, Ned Batchelder wrote: On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Chris Angelico
On Tue, Dec 3, 2013 at 2:45 AM, Mark Lawrence wrote: > He's quite deliberately dragged it up by using p.s. Without doubt he's the > worst loser in the world and I'm *NOT* stopping getting at him. I find his > behaviour, continuously and groundlessly insulting the Python core > developers, quite

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 15:22, Ned Batchelder wrote: On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly do not understand the repeated requests *NO

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly do not understand the repeated requests *NOT* to send us double spaced crap via goog

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread wxjmfauth
Le dimanche 1 décembre 2013 21:54:48 UTC+1, Tim Delaney a écrit : > On 2 December 2013 07:15, wrote: > > > 0.11.13 02:44, Steven D'Aprano написав(ла): > > > > (2) If you reverse that string, does it give "lëon"? The implication of > > > this question is that strings should operate on graphem

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 22:50, Ethan Furman wrote: On 12/01/2013 02:06 PM, Mark Lawrence wrote: I don't remember him [jmf] ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I flagged up here http://bugs.python.org/is

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Ethan Furman
On 12/01/2013 02:06 PM, Mark Lawrence wrote: I don't remember him [jmf] ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I flagged up here http://bugs.python.org/issue16061. It was fixed by Serhiy Storch

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 22:29, Tim Delaney wrote: On 2 December 2013 09:06, Mark Lawrence mailto:breamore...@yahoo.co.uk>> wrote: I don't remember him ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I fl

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Tim Delaney
On 2 December 2013 09:06, Mark Lawrence wrote: > I don't remember him ever having a valid point, so FTR can we have a > reference please. I do remember Steven D'Aprano showing that there was a > regression which I flagged up here http://bugs.python.org/issue16061. It > was fixed by Serhiy Storc

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 20:54, Tim Delaney wrote: On 2 December 2013 07:15, mailto:wxjmfa...@gmail.com>> wrote: 0.11.13 02:44, Steven D'Aprano написав(ла): > (2) If you reverse that string, does it give "lëon"? The implication of > this question is that strings should operate on grapheme

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Tim Delaney
On 2 December 2013 07:15, wrote: > 0.11.13 02:44, Steven D'Aprano написав(ла): > > (2) If you reverse that string, does it give "lëon"? The implication of > > this question is that strings should operate on grapheme clusters rather > > than code points. ... > > > > BTW, a grapheme cluster *is* a

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread wxjmfauth
0.11.13 02:44, Steven D'Aprano написав(ла): > (2) If you reverse that string, does it give "lëon"? The implication of > this question is that strings should operate on grapheme clusters rather > than code points. ... > BTW, a grapheme cluster *is* a code points cluster. jmf -- https://mail.pyth

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Serhiy Storchaka
30.11.13 02:44, Steven D'Aprano написав(ла): (2) If you reverse that string, does it give "lëon"? The implication of this question is that strings should operate on grapheme clusters rather than code points. Python fails this test: py> print("noe\u0308l"[::-1]) leon >>> print(unicodedata.norma

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread wxjmfauth
Le dimanche 1 décembre 2013 00:07:36 UTC+1, Ned Batchelder a écrit : > On 11/30/13 5:37 PM, Gregory Ewing wrote: > > > wxjmfa...@gmail.com wrote: > > >> And do you know the origin of this typographical feature? > > >> Because, mechanically, the dot of the "i" broke too often. > > >> > > >> In

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Chris Angelico
On Sun, Dec 1, 2013 at 12:27 PM, Roy Smith wrote: >> http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/ >> >> ChrisA > > What means "PFY"? The only thing I can think of is "Poor F---ing > Yankee" :-) In the context of the BOFH, it stands for Pimply-Faced Youth and means BOFH's assista

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Roy Smith
In article , Chris Angelico wrote: > On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano > wrote: > > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > > > >> On 2013-12-01 00:22, Steven D'Aprano wrote: > >>> * KELVIN SIGN versus LATIN CAPITAL LETTER A > >> > >> I should hope so ;-) > > > > >

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Chris Angelico
On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano wrote: > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > >> On 2013-12-01 00:22, Steven D'Aprano wrote: >>> * KELVIN SIGN versus LATIN CAPITAL LETTER A >> >> I should hope so ;-) > > > I blame my keyboard, where letters A and K are practicall

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Tim Chase
On 2013-12-01 00:54, Steven D'Aprano wrote: > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > > > On 2013-12-01 00:22, Steven D'Aprano wrote: > >> * KELVIN SIGN versus LATIN CAPITAL LETTER A > > > > I should hope so ;-) > > > I blame my keyboard, where letters A and K are practical

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Steven D'Aprano
On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > On 2013-12-01 00:22, Steven D'Aprano wrote: >> * KELVIN SIGN versus LATIN CAPITAL LETTER A > > I should hope so ;-) I blame my keyboard, where letters A and K are practically right next to each other, only seven letters apart. An easy typo

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Tim Chase
On 2013-12-01 00:22, Steven D'Aprano wrote: > * KELVIN SIGN versus LATIN CAPITAL LETTER A I should hope so ;-) -tkc -- https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Steven D'Aprano
On Sun, 01 Dec 2013 11:37:30 +1300, Gregory Ewing wrote: > Which makes it even sillier to have an 'ffi' character in this day and > age, when you can simply space the characters so that they overlap. It's in Unicode to support legacy character sets that included it[1]. There are a bunch of simil

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Ned Batchelder
On 11/30/13 5:37 PM, Gregory Ewing wrote: wxjmfa...@gmail.com wrote: And do you know the origin of this typographical feature? Because, mechanically, the dot of the "i" broke too often. In my opinion, a very plausible explanation. It doesn't sound very plausible to me, because there are a lot

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Gregory Ewing
Steven D'Aprano wrote: On Sat, 30 Nov 2013 00:37:17 -0500, Roy Smith wrote: So, who am I to argue with the people who decided that I needed to be able to type a "PILE OF POO" character. Blame the Japanese for that. Apparently some of the biggest users of Unicode are the various Japanese mobi

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Gregory Ewing
wxjmfa...@gmail.com wrote: And do you know the origin of this typographical feature? Because, mechanically, the dot of the "i" broke too often. In my opinion, a very plausible explanation. It doesn't sound very plausible to me, because there are a lot more stand-alone 'i's in English text than

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread wxjmfauth
Le samedi 30 novembre 2013 03:08:49 UTC+1, Roy Smith a écrit : > > > > The whole idea of ligatures like fi is purely typographic. The crossbar > > on the "f" (at least in some fonts) runs into the dot on the "i". > > Likewise, the top curl on an "f" run into the serif on top of the "l" >

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Mark Lawrence
On 30/11/2013 02:08, Roy Smith wrote: In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: (8) What's the uppercase of "baffle" spelled with an ffl ligature? Like most other languages, Python 3.2 fails: py> 'baffle'.upper() 'BAfflE' but Python 3.3 passe

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 00:37:17 -0500, Roy Smith wrote: > So, who am I to argue with the people who decided that I needed to be > able to type a "PILE OF POO" character. Blame the Japanese for that. Apparently some of the biggest users of Unicode are the various Japanese mobile phone manufacturers,

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 23:00:27 -0700, Ian Kelly wrote: > On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith wrote: >> I was speaking specifically of "ligatures like fi" (or, if you prefer, >> "ligatures like ό". By which I mean those things printers invented >> because some letter combinations look funny

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 02:05:59 -0300, Zero Piraeus wrote: > (I happen to think the presence of ligatures in Unicode is insane, but > my dictator-of-the-world certificate appears to have gotten lost in the > post, so fixing that will have to wait). You're probably right, but we live in an insane wor

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Ian Kelly
On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith wrote: > I was speaking specifically of "ligatures like fi" (or, if you prefer, > "ligatures like ό". By which I mean those things printers invented > because some letter combinations look funny when typeset as two distinct > letters. I think the encod

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > > The whole idea of ligatures like fi is purely typographic. > > In English, that's correct. I'm not sure if we can generalise that to all > languages that have ligatures. It also partly depends on how y

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Gene Heskett
On Saturday 30 November 2013 00:23:22 Zero Piraeus did opine: > On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote: > > On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > > > The whole idea of ligatures like fi is purely typographic. > > > > In English, that's correct. I'm not su

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Zero Piraeus
: On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote: > On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > > The whole idea of ligatures like fi is purely typographic. > > In English, that's correct. I'm not sure if we can generalise that to > all languages that have ligatures. I

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > You edited my text to remove the ligature? That's... unfortunate. It was un-ligated by the time it reached me. -- https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, > Steven D'Aprano wrote: > >> (8) What's the uppercase of "baffle" spelled with an ffl ligature? >> >> Like most other languages, Python 3.2 fails: >> >> py> 'baffle'.upper

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Dave Angel
On Fri, 29 Nov 2013 21:28:47 -0500, Roy Smith wrote: In article , Chris Angelico wrote: > On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > > I would certainly expect, x.lower() == x.upper().lower(), to be True for > > all values of x over the set of valid unicode codepoints. Having >

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article , Chris Angelico wrote: > On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > > I would certainly expect, x.lower() == x.upper().lower(), to be True for > > all values of x over the set of valid unicode codepoints. Having > > u"\uFB04".upper() ==> "FFL" breaks that. I would also ex

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Chris Angelico
On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > I would certainly expect, x.lower() == x.upper().lower(), to be True for > all values of x over the set of valid unicode codepoints. Having > u"\uFB04".upper() ==> "FFL" breaks that. I would also expect len(x) == > len(x.upper()) to be True. T

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > (8) What's the uppercase of "baffle" spelled with an ffl ligature? > > Like most other languages, Python 3.2 fails: > > py> 'baffle'.upper() > 'BAfflE' > > but Python 3.3 passes: > > py> 'baffle'.upper

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Mark Lawrence
On 30/11/2013 00:44, Steven D'Aprano wrote: (5) What is the length of "😸😾"? Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E (POUTING CAT FACE) are outside the Basic Multilingual Plane, which means they require more than two bytes each. Most programming languages using

Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
There's a recent blog post complaining about the lousy support for Unicode text in most programming languages: http://mortoray.com/2013/11/27/the-string-type-is-broken/ The author, Mortoray, gives nine basic tests to understand how well the string type in a language works. The first four involv