[issue18234] Unicodedata module should provide access to codepoint aliases

2014-10-11 Thread flying sheep
flying sheep added the comment: IDK if it came with unicode 7.0, but there is clarification: # Note that currently the only instances of multiple aliases of the same # type for a single code point are either of type "control" or "abbreviation". # An alias of type "abbreviation" can, in principle

[issue18234] Unicodedata module should provide access to codepoint aliases

2014-02-10 Thread Ezio Melotti
Ezio Melotti added the comment: See also #20433. -- stage: -> needs patch versions: +Python 3.5 -Python 3.4 ___ Python tracker ___ __

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 18:10, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > >> The .aliases() function would have to return a list, not a single >> name, so a parameter would cause the return type to change, which >> is not a good idea. >

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Martin v . Löwis
Martin v. Löwis added the comment: But some of these types could still have lists as values, no? -- ___ Python tracker ___ ___ Python-

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: > The .aliases() function would have to return a list, not a single > name, so a parameter would cause the return type to change, which > is not a good idea. You misunderstood my proposal. .name() will still return a single name, but the type parameter w

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 16:58, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > Here is an example of "prior art" that is relevant to this discussion: > > """ > charnames::viacode(code) > .. > As mentioned above under ALIASES, Unicode 6.1

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Here is an example of "prior art" that is relevant to this discussion: """ charnames::viacode(code) .. As mentioned above under ALIASES, Unicode 6.1 defines extra names (synonyms or aliases) for some code points, most of which were already available as Pe

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 16:35, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > MAL> Please leave the function as it is, i.e. a 1-1 mapping to the > MAL> official, non-changing Unicode name reference (including > MAL> spelling errors, etc).

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: MAL> Please leave the function as it is, i.e. a 1-1 mapping to the MAL> official, non-changing Unicode name reference (including MAL> spelling errors, etc). Same with code points that have no name. Since we have code points with no name - it is not 1-1 map

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 10:05, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > Perhaps unicodedata.aliases() should return not a list, but an ordered dict. > > What name should use the "namereplace" error handler? Original or corrected? > Should

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Perhaps unicodedata.aliases() should return not a list, but an ordered dict. What name should use the "namereplace" error handler? Original or corrected? Should it use first alias if there is no original name? --

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 23.06.2013 22:43, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that > misspelled names are better than corrected because they are more likely to > ap

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I mistyped issue reference above it should be #12753, not 12353. -- ___ Python tracker ___ ___

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that misspelled names are better than corrected because they are more likely to appear misspelled in other sources. I am not sure I buy this argument. Someone googling for 'BYZANTI

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: > Can a character or sequence have multiple aliases? Yes, for example, most control characters have two aliases (and no name). ;NULL;control ;NUL;abbreviation 0001;START OF HEADING;control 0001;SOH;abbreviation 0002;START OF TEXT;control 0002;STX;

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Can a character or sequence have multiple aliases? What will be a result type of unicodedata.name() with "abbreviation" keyword value? -- ___ Python tracker

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Rather than adding a new method to unicodedata, what do you think about adding a type keyword argument to unicodedata.name()? It can default to "canonical" and have possible values "control", "abbreviation", etc. See also #12753. -- __

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-20 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: UCD provides more than just a list of aliases: formal name aliases have "type" - control, abbreviation, etc. See . -- ___ Python tracker

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-20 Thread Martin v . Löwis
Martin v. Löwis added the comment: I think the best way would be to provide a function unicodedata.aliases, returning a list of names for a given character or sequence. -- ___ Python tracker __

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-17 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka ___ Python tracker ___ ___ Py

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-16 Thread Alexander Belopolsky
New submission from Alexander Belopolsky: Python is aware of unicode codepoint aliases, but unicodedata does not provide a way to find aliases of a given codepoint: >>> ucd.lookup('ESCAPE') == '\N{ESCAPE}' True >>> ucd.lookup('RS') == '\N{RS}' True but >>> ucd.name('\N{ESCAPE}') Traceback (mo