[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2016-03-10 Thread Benjamin Peterson
Benjamin Peterson added the comment: The full case mappings do not preserve normalization form. >>> for c in 'ΰ'.upper().lower(): print(unicodedata.name(c)) ... GREEK SMALL LETTER UPSILON COMBINING DIAERESIS COMBINING ACUTE ACCENT >>> unicodedata.normalize('NFC', 'ΰ'.upper().lower()) == 'ΰ'

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2016-03-10 Thread Guido van Rossum
Changes by Guido van Rossum : -- nosy: -gvanrossum ___ Python tracker ___ ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2016-03-10 Thread SilentGhost
Changes by SilentGhost : -- versions: +Python 3.4, Python 3.5, Python 3.6 -Python 2.7 ___ Python tracker ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2016-03-10 Thread Андрей Баксаляр
Андрей Баксаляр added the comment: Interestingly, the bug is still reproducible in version 3.5.1, but fixed in 2.7.6. -- versions: +Python 2.7 -Python 3.4 Added file: http://bugs.python.org/file42121/pythonbug.png ___ Python tracker

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2016-03-10 Thread Андрей Баксаляр
Андрей Баксаляр added the comment: A same problem with the unicode case mapping is still present in the Python 3.4.3. You can repeat the bug with this code, for instance: 'ΰ'.upper().lower() == 'ΰ' The case swapping is strangelly leads to character replacement: b'\xce\xb0' →

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-15 Thread Jim Jewett
Jim Jewett jimjjew...@gmail.com added the comment: Why was the delta-processing removed from the casing functions? As best I can tell, the whole point of going through multiple levels of indirection (courtesy splitbins) is to maximize compression and minimize the amount of cache that unicode

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-15 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 03ea95e3b497 by Benjamin Peterson in branch 'default': delta encoding of upper/lower/title makes a glorious return (#12736) http://hg.python.org/cpython/rev/03ea95e3b497 --

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-12 Thread Jim Jewett
Jim Jewett jimjjew...@gmail.com added the comment: The currently applied patch ( http://hg.python.org/cpython/rev/f7e05d205a52 ) left some dead code in unicodeobject.c function fixup ( http://hg.python.org/cpython/file/f7e05d205a52/Objects/unicodeobject.c#l9386 ) has a shortcut for when the

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-11 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: New patch with title casing mappings added. -- Added file: http://bugs.python.org/file24204/full-casemapping.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-11 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset f7e05d205a52 by Benjamin Peterson in branch 'default': use full unicode mappings for upper/lower/title case (#12736) http://hg.python.org/cpython/rev/f7e05d205a52 -- nosy: +python-dev

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-11 Thread Benjamin Peterson
Changes by Benjamin Peterson benja...@python.org: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-10 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: __ap__'s implementation method is about 2x faster than mine. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-10 Thread Benjamin Peterson
Changes by Benjamin Peterson benja...@python.org: Added file: http://bugs.python.org/file24199/full-casemapping.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-09 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: New patch. I implemented it the way Antoine desired. It seems rather inefficient to be copying around so much data... -- Added file: http://bugs.python.org/file24190/full-casemapping.patch ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-07 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: Here is a patch. I only dealt with case mappings and not titlecase. Doing titlecase properly requires word segmentation, which I think should be another patch/issue. This patch fixes swapcase(), capitalize(), upper(), and lower(). It

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-29 Thread Jean-Michel Fauth
Jean-Michel Fauth wxjmfa...@gmail.com added the comment: Œ, œ or even are historically ligatures or ligatured forms. In the French typography, they are single plain letters and they belong the group of the 42 letters used in the French typography. Typographically speaking, using oe instead of œ

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-29 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Œ, œ or even are historically ligatures or ligatured forms. In the French typography, they are single plain letters and they belong the group of the 42 letters used in the French typography. Typographically speaking, using oe instead of œ is

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-29 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Antoine Pitrou rep...@bugs.python.org wrote on Mon, 29 Aug 2011 13:21:06 -: It's not only typographically speaking, it's really a spelling error, even in hand-written text :-) Sure, and so too is omitting an accent mark or

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-28 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: Thanks Tom for such a clear explanation! I hope someone will implement this. (Matthew, does this affect regex? I am guessing it does, for case-insensitive matching?) -- ___ Python tracker

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-28 Thread Matthew Barnett
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: The regex module currently uses simple case-folding, although I'm working towards full case-folding, as listed in http://www.unicode.org/Public/UNIDATA/CaseFolding.txt. -- ___ Python

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-28 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Antoine Pitrou rep...@bugs.python.org wrote on Sat, 27 Aug 2011 20:04:56 -: Neither am I. Even in old-style English with ae and oe, one wrote ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or *Aesir. Similarly

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:11:24 -: Would this also affect .islower() and friends? SHORT VERSION: (7 lines) I don't believe so, but the relationship between lower() and islower()

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: Thanks you very much. We should fix the behavior in 3.3 for sure. I'm thinking that we may be able to backport the behavior fix to 2.7 and 3.2 as well, since it just makes the behavior generally better (and for most folks it won't matter

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Sat, 27 Aug 2011 16:15:33 -: Although personally I don't have much of an intuition for what titlecase means (and why it's important), perhaps because I'm not familiar with any

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Matthew Barnett
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: There are some oddities in Unicode case-folding. Under full case-folding, both \N{LATIN CAPITAL LETTER SHARP S} and \N{LATIN SMALL LETTER SHARP S} fold to ss, which means that those codepoints match each other. However, under

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Neither am I. Even in old-style English with ae and oe, one wrote ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or *Aesir. Similarly with ŒNOLOGY / Œnology / œnology, never *Oenology. Trying to disprove you a bit:

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: FTR, with the latest Python 3.2/3.3 (narrow) I get: Total failures: 58 / 500 ( 12%) Total successes: 442 / 500 ( 88%) and with the latest Python 3.2/3.3 (wide) I get: Total failures: 52 / 500 ( 10%) Total successes: 448 / 500

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: I presume this applies to builtin str methods like .lower(), right? I think it is a good thing to do for Python 3.3. We'd need to define what should happen in edge cases, e.g. when (against all odds) a string happens to contain a lone

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:11:24 -: Guido van Rossum gu...@python.org added the comment: I presume this applies to builtin str methods like .lower(), right? I think it is a good thing to

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Here’s my casing test suite; I thought I sent it in but the mux file here isn’t the full thing. It does several things, including letting you run it with regex vs re. It also checks for the islower, etc functions. It has both simple and

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-12 Thread Éric Araujo
Changes by Éric Araujo mer...@netwok.org: -- components: +Interpreter Core, Unicode -Library (Lib) versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-12 Thread Arfrever Frehtes Taifersar Arahesis
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com: -- nosy: +Arfrever ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-12 Thread Matthew Barnett
Changes by Matthew Barnett pyt...@mrabarnett.plus.com: -- nosy: +mrabarnett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-11 Thread Tom Christiansen
New submission from Tom Christiansen tchr...@perl.com: Python's casemapping functions only use what Unicode calls simple casemaps. These are only appropriate for functions that operate on single characters alone, not for those that operate on strings. The reason for this is that you get much

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-11 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +belopolsky, ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___