Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky : > On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom > wrote: > .. >> Do you know of any re engine fully complying to to tr18, even at the >> first level: "Basic Unicode Support"? >> > """ > ICU Regular Expressions conform to Unicode Technical Standard #18 , > Unicode

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Alexander Belopolsky
On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom wrote: .. > Do you know of any re engine fully complying to to tr18, even at the > first level: "Basic Unicode Support"? > """ ICU Regular Expressions conform to Unicode Technical Standard #18 , Unicode Regular Expressions, level 1, and in addition in

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Martin v. Löwis
Am 07.12.2010 04:03, schrieb Alexander Belopolsky: > On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis" wrote: >>> I actually wonder if Python's re module can claim to provide even >>> Basic Unicode Support. >> >> Do you really wonder? Most definitely it does not. >> > > Were you more optimistic f

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Alexander Belopolsky
On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom wrote: .. > It seems, e.g. in Perl, there are some omissions too > http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level > > Do you know of any re engine fully complying to to tr18, even at the > first level: "Basic Unicode

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky : > On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis" wrote: >>> I actually wonder if Python's re module can claim to provide even >>> Basic Unicode Support. >> >> Do you really wonder? Most definitely it does not. >> > > Were you more optimistic four years ago? > >

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-06 Thread Alexander Belopolsky
On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis" wrote: >> I actually wonder if Python's re module can claim to provide even >> Basic Unicode Support. > > Do you really wonder? Most definitely it does not. > Were you more optimistic four years ago? http://bugs.python.org/issue1528154#msg54864

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Martin v. Löwis
> I actually wonder if Python's re module can claim to provide even > Basic Unicode Support. Do you really wonder? Most definitely it does not. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/pyt

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Alexander Belopolsky
On Fri, Dec 3, 2010 at 12:10 AM, Alexander Belopolsky wrote: .. > I don't think decimal module should support non-European decimal > digits.  The only place where it can make some sense is in int() > because here we have a fighting chance of producing a reasonable > definition.   The motivating us

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Antoine Pitrou
Le samedi 04 décembre 2010 à 17:13 +0900, Stephen J. Turnbull a écrit : > Antoine Pitrou writes: > > Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a > > écrit : > > > Antoine Pitrou writes: > > > > > > > The legacy format argument looks like a red herring to me. When > > >

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Stephen J. Turnbull
Antoine Pitrou writes: > Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a > écrit : > > Antoine Pitrou writes: > > > > > The legacy format argument looks like a red herring to me. When > > > converting from a format to another it is the programmer's job to > > > his/her

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread Antoine Pitrou
Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a écrit : > Antoine Pitrou writes: > > > The legacy format argument looks like a red herring to me. When > > converting from a format to another it is the programmer's job to > > his/her job right. > > Uhmm, the argument *for*

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread M.-A. Lemburg
Alexander Belopolsky wrote: > On Thu, Dec 2, 2010 at 5:58 PM, M.-A. Lemburg wrote: > .. >>> I will change my mind on this issue when you present a >>> machine-readable file with Arabic-Indic numerals and a program capable >>> of reading it and show that this program uses the same number parsing >>

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread Neil Hodgson
Stephen J. Turnbull: > Will it accept Arabic on input?  (Han might be too much to ask for > since Unicode considers Han digits to be "impure".) I couldn't find a direct way to input Arabic digits into OO Calc, the normal use of Alt+number didn't work in Calc although it did in WordPad where Al

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:57 PM, Mark Dickinson wrote: .. > (the decimal spec requires that non-European digits be accepted). Mark, I think *requires* is too strong of a word to describe what the spec says. The decimal module documentation refers to two authorities: 1. IBM’s General Decimal Ar

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Antoine Pitrou writes: > The legacy format argument looks like a red herring to me. When > converting from a format to another it is the programmer's job to > his/her job right. Uhmm, the argument *for* this "feature" proposed by several people is that Python's numeric constructors do it (

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Neil Hodgson writes: >While I don't have Excel to test with, OpenOffice.org Calc will > display in Arabic or Han numerals using the NatNum format codes. Display is different from input, but at least this is concrete evidence. Will it accept Arabic on input? (Han might be too much to ask f

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread haiyang kang
> Furthermore, data can well originate from texts that were written > hundreds or even thousands of years ago, so there is plenty of > material available for processing. humm..., for this, i think we need a special tuned language processing system to handle this, and one subsystem for one languag

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Lennart Regebro writes: > 2010/12/2 Stephen J. Turnbull : > > T1000 = float('一.◯◯◯') > > That was already discussed here, and it's clear that unicode does not > consider these characters to be something you can use in a decimal > number, and hence it's not broken. Huh? IOW, use Unicode fe

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Terry Reedy
On 12/2/2010 6:54 PM, Alexander Belopolsky wrote: On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg wrote: .. Some examples: http://www.bdl.gov.lb/circ/intpdf/int123.pdf I looked at this one more closely. While I cannot understand what it says, It appears that Arabic numerals are used in dates.

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Steven D'Aprano
Stephen J. Turnbull wrote: Steven D'Aprano writes: > With full respect to haiyang kang, hear-say from one person can hardly > be described as "strong" evidence That's *disrespectful* nonsense. What Haiyang reported was not hearsay, it's direct observation of what he sees around him and per

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg wrote: .. > Some examples: > > http://www.bdl.gov.lb/circ/intpdf/int123.pdf I looked at this one more closely. While I cannot understand what it says, It appears that Arabic numerals are used in dates. It looks like Python want be able to deal with

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Eric Smith wrote: > On 12/2/2010 5:43 PM, M.-A. Lemburg wrote: >> Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. >>> >>> I agree with everything Mar

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
> The point is that we support all of Unicode in Python, not just a fragment, > and therefore the numeric constructors support all of Unicode. That conclusion is as false today as it was in Python 1.6, but only now people start caring about that. a) we don't support all of Unicode in numeric cons

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Eric Smith
On 12/2/2010 5:43 PM, M.-A. Lemburg wrote: Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. I agree with everything Martin says here. I think the basic premise is: y

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 23:43, schrieb M.-A. Lemburg: > Eric Smith wrote: >>> The current behavior should go nowhere; it is not useful. Something very >>> similar to the current behavior (but done correctly) should go into the >>> locale module. >> >> I agree with everything Martin says here. I think the bas

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 5:58 PM, M.-A. Lemburg wrote: .. >> I will change my mind on this issue when you present a >> machine-readable file with Arabic-Indic numerals and a program capable >> of reading it and show that this program uses the same number parsing >> algorithm as Python's int() or flo

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Terry Reedy wrote: > On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: >> Nick Coghlan wrote: >>> On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII cas

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Alexander Belopolsky wrote: > On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg wrote: > .. >> Have you tried Google ? >> > > I tried google at I could not find any plain text or HTML file that > would use Arabic-Indic numerals. What was interesting, though that a > search for "quran unicode" (witho

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Eric Smith wrote: >> The current behavior should go nowhere; it is not useful. Something very >> similar to the current behavior (but done correctly) should go into the >> locale module. > > I agree with everything Martin says here. I think the basic premise is: > you won't find strings "in the wi

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Eric Smith
On 12/2/2010 4:48 PM, "Martin v. Löwis" wrote: Am 02.12.2010 22:30, schrieb Steven D'Aprano: Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowled

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Mark Dickinson
On Thu, Dec 2, 2010 at 8:23 PM, "Martin v. Löwis" wrote: > In the case of number parsing, I think Python would be better if > float() rejected non-ASCII strings, and any support for such parsing > should be redone correctly in a different place (preferably along with > printing of numbers). +1.

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg wrote: .. > Have you tried Google ? > I tried google at I could not find any plain text or HTML file that would use Arabic-Indic numerals. What was interesting, though that a search for "quran unicode" (without quotes). Brought me to http://www.sacr

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 22:30, schrieb Steven D'Aprano: > Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing syste

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 16:34 -0500, Alexander Belopolsky a écrit : > On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou wrote: > .. > > I don't think so. str.split() and str.splitlines() are also defined in > > conformance to the SPEC, AFAIK. They certainly try to. > > You are joking, right?

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou wrote: .. > I don't think so.  str.split() and str.splitlines() are also defined in > conformance to the SPEC, AFAIK.  They certainly try to. You are joking, right? Where exactly does Unicode specify something like this: >>> ''.join('𐌀𐌁𐌂'.split('\u

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Steven D'Aprano
Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing system in which '١٢٣٤.٥٦e4' means 12345600.0. I'm not sure what you're

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
> Arabic numerals are being used a lot nowadays in Asian countries, > but that doesn't mean that the native script versions are not > being used anymore. I never claimed that people are not using their local scripts to enter numbers. However, none of your examples is about Chinese numerals using a

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
"Martin v. Löwis" wrote: >> [...] >> For direct entry by an interactive user, yes. Why are some people in >> this discussion thinking only of direct entry by an interactive user? > > Ultimately, somebody will have entered the data. I don't think you really believe that all data processed by a com

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
>> Then these users should speak up and indicate their need, or somebody >> should speak up and confirm that there are users who actually want >> '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing >> system in which '١٢٣٤.٥٦e4' means 12345600.0. > > I'm not sure what you're after he

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Georg Brandl
Am 02.12.2010 20:40, schrieb "Martin v. Löwis": >> Maybe all past, present and future whatsnew maintainers can agree on these >> rules, which I copied directly from whatsnew/3.2.rst? > > I don't think all past maintainers can Yes, and the same goes for the future ones, since they may not even kno

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
"Martin v. Löwis" wrote: >>> Now, one may wonder what precisely a "possibly signed floating point >>> number" is, but most likely, this refers to >>> >>> floatnumber ::= pointfloat | exponentfloat >>> pointfloat::= [intpart] fraction | intpart "." >>> exponentfloat ::= (intpart | pointfloa

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
> Maybe all past, present and future whatsnew maintainers can agree on these > rules, which I copied directly from whatsnew/3.2.rst? I don't think all past maintainers can (I'm pretty certain that AMK would disagree), but if that's the current policy, I can certainly try following it (I didn't kno

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 03:01, schrieb Ben Finney: > "Stephen J. Turnbull" writes: > >> Furthermore, he provided good *objective* reason (excessive cost, to >> which I can also testify, in several different input methods for >> Japanese) why numbers simply would not be input that way. >> >> What's left is

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 13:14 -0500, Alexander Belopolsky a écrit : > > I don't understand why you think Arabic or Hebrew text is any different > > from Western text. Surely right-to-left isn't more conceptually > > complicated than left-to-right, is it? > > > > No, but a mix of LTR and RTL is

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 11:56 AM, Antoine Pitrou wrote: > Le jeudi 02 décembre 2010 à 11:41 -0500, Alexander Belopolsky a écrit : >> >> Note that my point is not to find the correct answer here, but to >> demonstrate that we as a group don't have the expertise to get parsing >> of Arabic text right

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 11:41 -0500, Alexander Belopolsky a écrit : > > Note that my point is not to find the correct answer here, but to > demonstrate that we as a group don't have the expertise to get parsing > of Arabic text right. I don't understand why you think Arabic or Hebrew text is

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 8:36 AM, Antoine Pitrou wrote: > On Wed, 1 Dec 2010 22:28:49 -0500 > Alexander Belopolsky wrote: .. >> This matches my limited research on this topic as well.  However, I am >> not sure that when these codes are embedded in Arabic text, their >> logical order always matches

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
On Wed, 1 Dec 2010 22:28:49 -0500 Alexander Belopolsky wrote: > > > > Both my personal observations when travelling from Turkey to India and > > Wikipedia say yes. "When representing a number in Arabic, the lowest-valued > > position is placed on the right, so the order of positions is the same as

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Lennart Regebro
2010/12/2 Stephen J. Turnbull : > Because that works, but > > print(T1234) > > doesn't (it prints ASCII).  You can't round-trip, but users will > want/expect that. You should be able to round-trip, absolutely. I don't think you should expect print() to do that. str(56) possibly. :) That's an argum

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Georg Brandl
Am 01.12.2010 23:39, schrieb "Martin v. Löwis": >> As of today, "What’s New In Python 3.2" [1] does not even mention the >> unicodedata upgrade to 6.0.0. > > One reason was that I was instructed not to change "What's New" a few > years ago. Maybe all past, present and future whatsnew maintainers

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Neil Hodgson
Stephen J. Turnbull: > Here's why: '''print "%d" % > some_integer''' doesn't now, and never will (unless Kristan gets his > Python 2.8), produce Arabic or Han numerals.  Not in any > language I know of, not in Microsoft Excel, and definitely not in > Python 2. While I don't have Excel to test

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Ben Finney writes: > Input from an existing text file, as I said earlier. Or any other way of > text data making its way into a Python program. > Direct entry at the console is a red herring. I don't think it is. Not at all. Here's why: '''print "%d" % some_integer''' doesn't now, and never

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy wrote: > On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: > >> it.  The argument was that if there was a use case for parsing Eastern >> Arabic numerals, it would be better served by a module written by >> someone who speaks one of the Arabic languages

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy
On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are writ

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Ben Finney
"Stephen J. Turnbull" writes: > Furthermore, he provided good *objective* reason (excessive cost, to > which I can also testify, in several different input methods for > Japanese) why numbers simply would not be input that way. > > What's left is copy/paste via the mouse. For direct entry by an

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Steven D'Aprano writes: > With full respect to haiyang kang, hear-say from one person can hardly > be described as "strong" evidence That's *disrespectful* nonsense. What Haiyang reported was not hearsay, it's direct observation of what he sees around him and personal experience, plus extrapo

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 7:17 PM, Steven D'Aprano wrote: .. > we should continue to support the existing behaviour. None of the arguments > against it seem convincing to me, particularly since the opponents of the > current behaviour admit that there is a use-case for it, but they just want > it to

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Lennart Regebro writes: > On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull > wrote: > > Sure you can.  In Python program text, all keywords will be ASCII > > Yes, yes, sure, but not the contents of variables, Irrelevant, you're not converting these to a string representation. If you're g

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it wa

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 5:36 PM, "Martin v. Löwis" wrote: .. >> Note that I'm not saying this is common. Nor am I saying it's a >> desirable situation. I'm saying it is a feasible use case, to be >> dismissed only if there is strong evidence that it's not used by >> existing Python code. > > And in

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources. J

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
> As of today, "What’s New In Python 3.2" [1] does not even mention the > unicodedata upgrade to 6.0.0. One reason was that I was instructed not to change "What's New" a few years ago. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org ht

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
>> I think the OP (haiyang kang) already indicated that he finds it quite >> unlikely that anybody would possibly want to enter that. > > Who's talking about *entering* it into the program at a keyboard > directly, though? Input to a program can come from all kinds of crazy > sources. Just because

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
>> And here, my observation stands: if they wanted to, they currently >> couldn't - at least not for real numbers (and also not for integers >> if they want to use grouping). So the presumed application of this >> feature doesn't actually work, despite the presence of the feature it >> was supposed

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy
On 12/1/2010 12:55 PM, Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg wrote: .. With Python 3.1: exec('\u0CF1 = 1') Traceback (most recent call last): File "", line 1, in File "", line 1 ೱ = 1 ^ SyntaxError: invalid character in identifier but with

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg wrote: .. >> With Python 3.1: >> > exec('\u0CF1 = 1') >> Traceback (most recent call last): >>  File "", line 1, in >>  File "", line 1 >>    ೱ = 1 >>      ^ >> SyntaxError: invalid character in identifier >> >> but with Python 3.2a4: >> > ex

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Lennart Regebro
On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull wrote: > Sure you can.  In Python program text, all keywords will be ASCII Yes, yes, sure, but not the contents of variables, > I see no reason not to make a similar promise for numeric literals. Wait what, literas? The example was >>> float('

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: Am 30.11.2010 23:43, schrieb Terry Reedy: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote: > On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: > >> I see no reason not to make a similar promise for numeric literals. I >> see no good reason to allow compatibility full-width Japanese "ASCII" >> numerals or Arabic cursive numerals in "for i in range(...)" for >> example

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
"Martin v. Löwis" wrote: > Am 30.11.2010 21:24, schrieb Ben Finney: >> haiyang kang writes: >> >>> I think it is a little ugly to have code like this: num = >>> float("一.一"), expected result is: num = 1.1 >> >> That's a straw man, though. The string need not be a literal in the >> program; it ca

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote: > On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: > > My general answers to the questions you have raised are as follows: > > 1. Each new feature release should use the latest version of the UCD as > of the first beta release (or perhaps a week or so before). New chars > ar

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
"Martin v. Löwis" writes: > Am 30.11.2010 21:24, schrieb Ben Finney: > > The string need not be a literal in the program; it can be input to > > the program. > > > > num = float(input_from_the_external_world) > > > > Does that change your assessment of whether non-ASCII digits are > > used?

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new features and the beta pe

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 23:43, schrieb Terry Reedy: > On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: > >> I see no reason not to make a similar promise for numeric literals. I >> see no good reason to allow compatibility full-width Japanese "ASCII" >> numerals or Arabic cursive numerals in "for i in ran

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 21:24, schrieb Ben Finney: > haiyang kang writes: > >> I think it is a little ugly to have code like this: num = >> float("一.一"), expected result is: num = 1.1 > > That's a straw man, though. The string need not be a literal in the > program; it can be input to the program. > >

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. I do not think that anyone, a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
haiyang kang writes: > I think it is a little ugly to have code like this: num = > float("一.一"), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:55 +0100, "Martin v. Löwis" a écrit : > Wrt. to local number parsing, I think that the locale module would be > way better than the nonsense that Python currently does. In the locale > module, somebody at least has thought about what specifically > constitutes a numbe

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
> Because we all know how locale is a pile of cr*p, both in specification > and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. > Actually, I remember you saying that locale shoul

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:40 +0100, "Martin v. Löwis" a écrit : > Am 30.11.2010 20:23, schrieb Antoine Pitrou: > > Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : > >>> Would moving this functionality to the locale module make the issues any > >>> easier to fix? > >> > >>

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 20:23, schrieb Antoine Pitrou: > Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : >>> Would moving this functionality to the locale module make the issues any >>> easier to fix? >> >> You could delegate it to the C library, so: yes. > > I hope you don't suggest de

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : > > Would moving this functionality to the locale module make the issues any > > easier to fix? > > You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you? ___

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
> Would moving this functionality to the locale module make the issues any > easier to fix? You could delegate it to the C library, so: yes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 09:15, schrieb Hagen Fürstenau: >>> During PEP 3003 discussion, it was suggested to handle it on a case by >>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >>> 3003. >> >> It's covered by "As the standard library is not directly tied to the >> language definit

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou wrote: .. >> I am not sure this belongs to the locale module, however.  It seems to >> me, something like 'unicodealgo' for unicode algorithms would be more >> appropriate. > > It could simply be in unicodedata if you split the implementation into a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
> Sure, if we code it in Python, supporting it will by much easier: > > def normalize_digits(s): > digits = {m.group(1) for m in re.finditer('(\d)', s)} > trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} > return s.translate(trtab) > > >>> normalize_digits('١٢٣٤.٥٦') > '12

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord wrote: .. >> If you think non-ASCII digits are not difficult to support, please >> contribute to the following tracker issues: >> > > Would moving this functionality to the locale module make the issues any > easier to fix? > Sure, if we code it in

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Michael Foord
On 30/11/2010 16:40, Alexander Belopolsky wrote: [snip...] And of course, unicodedata.digit('\U0001D7CE') 0 but int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support.  A fix for issue10557 would be much > simpler if we did not support non-European digits.  I now added a >

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stefan Krah
Alexander Belopolsky wrote: > On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: > >> But you should be able to write: > >> > >> text = input("Enter a number using your preferred digits: ") > >> num = float(text) > >> > >> without caring whether the user enters 一.一 or 1.1 or something else. > >

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: >> But you should be able to write: >> >> text = input("Enter a number using your preferred digits: ") >> num = float(text) >> >> without caring whether the user enters 一.一 or 1.1 or something else. > > yes. from logical point of view, this can

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" wrote: >> - Should Python documentation refer to the specific version of Unicode >> that it supports? > > You mean, mention it somewhere? Sure (although it would be nice if the > documentation generator would automatically extract it from the sour

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
> But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano wrote: .. > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters 一.一 or 1.1 or something else. > I find it ironic that people who argue for

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano wrote: > > But I think there is a good case for allowing the constructors int, > float and complex to continue to accept numeric *strings* with non-ASCII > digits. The code already exists, there's probably people out there who > rely on it,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
Stephen J. Turnbull wrote: Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8) for the forseeable future.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
haiyang kang wrote: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "一"

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "一" I think it is a litt

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stephen J. Turnbull
Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8) for the forseeable future. I see no reason not to make

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Hagen Fürstenau
>> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." How is

  1   2   >