Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
Martin v. Löwis mar...@v.loewis.de writes: Am 30.11.2010 21:24, schrieb Ben Finney: The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread M.-A. Lemburg
Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg m...@egenix.com wrote: .. I don't see why the language spec should limit the wealth of number formats supported by float(). The Language Spec (whatever it is) should not, but hopefully the Library Reference should.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread M.-A. Lemburg
Nick Coghlan wrote: On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull step...@xemacs.org wrote: I agree that Python should make it easy for the programmer to get numerical values of native numeric strings, but it's not at all clear to me that there is any point to having float() recognize

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Nick Coghlan
On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg m...@egenix.com wrote: If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to consider whitespace, etc. We don't do that for a good reason:

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Antoine Pitrou
On Mon, 29 Nov 2010 13:58:05 +1000 Nick Coghlan ncogh...@gmail.com wrote: On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull step...@xemacs.org wrote: I agree that Python should make it easy for the programmer to get numerical values of native numeric strings, but it's not at all clear

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Antoine Pitrou
On Sun, 28 Nov 2010 21:32:15 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano st...@pearwood.info wrote: .. is more important than to assure users that once their program accepted some text as a number, they can assume that the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread M.-A. Lemburg
Nick Coghlan wrote: On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg m...@egenix.com wrote: If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to consider whitespace, etc. We don't do that for

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:22 AM, Martin v. Löwis mar...@v.loewis.de wrote: The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? It's YAGNI, feature bloat. It gives the illusion of

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Antoine Pitrou
On Mon, 29 Nov 2010 08:22:46 +0100 Martin v. Löwis mar...@v.loewis.de wrote: The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? It's YAGNI, feature bloat. It gives the illusion of

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread M.-A. Lemburg
Alexander Belopolsky wrote: On Mon, Nov 29, 2010 at 2:22 AM, Martin v. Löwis mar...@v.loewis.de wrote: The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? It's YAGNI, feature bloat.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Terry Reedy
On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: Nick Coghlan wrote: On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburgm...@egenix.com wrote: If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 1:33 PM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Nov 2010 08:22:46 +0100 Martin v. Löwis mar...@v.loewis.de wrote: The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:23 PM, Terry Reedy tjre...@udel.edu wrote: .. Since you are the knowledgable advocate of the current behavior, perhaps you could open an issue and propose a doc patch, even if not .rst formatted. I am not an advocate of the current behavior, but an issue for doc

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Martin v. Löwis
Am 29.11.2010 19:33, schrieb Antoine Pitrou: On Mon, 29 Nov 2010 08:22:46 +0100 Martin v. Löwis mar...@v.loewis.de wrote: The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? It's

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Martin v. Löwis
- Should Python documentation refer to the specific version of Unicode that it supports? You mean, mention it somewhere? Sure (although it would be nice if the documentation generator would automatically extract it from the source, just as it extracts the Python version number). Of course,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Steven D'Aprano
Alexander Belopolsky wrote: Speaking of YAGNI, does anyone want to defend complex('١٢٣٤.٥٦j') 1234.56j *If* we allow float('١٢٣٤.٥٦') (as we currently do, but is being disputed by some), then we should allow complex('١٢٣٤.٥٦j'). It would be silly for complex to be more restrictive than

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 5:09 PM, Steven D'Aprano st...@pearwood.info wrote: .. But in any case, please don't conflate the question of whether Python should accept j and/or i for complex numbers with the question of supporting non-arabic numerals. The two issues are unrelated. The two issues

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Stephen J. Turnbull
M.-A. Lemburg writes: Just because ASCII-proponents may have a hard time reading such literals, That's not the point. doesn't mean that script users have the same trouble. The script users may have no trouble reading them, but that doesn't mean it's not a YAGNI. In Japanese, it's a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-29 Thread Stephen J. Turnbull
Steven D'Aprano writes: But in any case, please don't conflate the question of whether Python should accept j and/or i for complex numbers with the question of supporting non-arabic numerals. The two issues are unrelated. Different, yes, unrelated, no. They're both about whether variant

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Antoine Pitrou
On Sun, 28 Nov 2010 15:24:37 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. Well, if unicode

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou solip...@pitrou.net wrote: .. For example, I don't think that supporting float('١٢٣٤.٥٦') 1234.56 is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII. Why

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Antoine Pitrou
On Sun, 28 Nov 2010 15:58:33 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou solip...@pitrou.net wrote: .. For example, I don't think that supporting float('١٢٣٤.٥٦') 1234.56 is more important than to assure users

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Joao S. O. Bueno
On Sun, Nov 28, 2010 at 7:04 PM, Antoine Pitrou solip...@pitrou.net wrote: On Sun, 28 Nov 2010 15:58:33 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou solip...@pitrou.net wrote: .. For example, I don't think that supporting

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 4:12 PM, Joao S. O. Bueno jsbu...@python.org.br wrote: .. Let novice C programmers in English speaking countries deal with the fact that 1 character is not 1 byte anymore. We are past this point. If you are, please contribute your expertise here:

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed decimal or floating point number, possibly embedded in whitespace. The argument

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:17 PM, Martin v. Löwis mar...@v.loewis.de wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread M.-A. Lemburg
Martin v. Löwis wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed decimal or floating point number, possibly

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread M.-A. Lemburg
Alexander Belopolsky wrote: Two recently reported issues brought into light the fact that Python language definition is closely tied to character properties maintained by the Unicode Consortium. [1,2] For example, when Python switches to Unicode 6.0.0 (planned for the upcoming 3.2 release),

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg m...@egenix.com wrote: .. I don't see why the language spec should limit the wealth of number formats supported by float(). The Language Spec (whatever it is) should not, but hopefully the Library Reference should. If you follow

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
Am 28.11.2010 23:31, schrieb Alexander Belopolsky: On Sun, Nov 28, 2010 at 5:17 PM, Martin v. Löwis mar...@v.loewis.de wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Terry Reedy
On 11/28/2010 3:58 PM, Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrousolip...@pitrou.net wrote: .. For example, I don't think that supporting float('١٢٣٤.٥٦') 1234.56 Even if this is somehow an accident or something that someone snuck in, I think it a good

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:56 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. This definition fails long before we get beyond 127-th code point: float('infinity') inf What do infer from that? That the definition is wrong, or the code is wrong? The development version of the reference manual

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
Now, one may wonder what precisely a possibly signed floating point number is, but most likely, this refers to floatnumber ::= pointfloat | exponentfloat pointfloat::= [intpart] fraction | intpart . exponentfloat ::= (intpart | pointfloat) exponent intpart ::= digit+

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Terry Reedy
On 11/28/2010 5:51 PM, Alexander Belopolsky wrote: The Language Spec (whatever it is) should not, but hopefully the Library Reference should. If you follow http://docs.python.org/dev/py3k/library/functions.html#float link and the references therein, you'll end up with digit ::=

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
+1 on all point below. On Sun, Nov 28, 2010 at 6:03 PM, Martin v. Löwis mar...@v.loewis.de wrote: Now, one may wonder what precisely a possibly signed floating point number is, but most likely, this refers to floatnumber   ::=  pointfloat | exponentfloat pointfloat    ::=  [intpart] fraction

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
Am 29.11.2010 00:01, schrieb Alexander Belopolsky: On Sun, Nov 28, 2010 at 5:56 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. This definition fails long before we get beyond 127-th code point: float('infinity') inf What do infer from that? That the definition is wrong, or the code is

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:03 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. Note that the support in float() (and the other numeric constructors) to work with Unicode code points was explicitly added when Unicode support was added to Python and has been available since Python 1.6. That

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:08 PM, Martin v. Löwis mar...@v.loewis.de wrote: Am 29.11.2010 00:01, schrieb Alexander Belopolsky: On Sun, Nov 28, 2010 at 5:56 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. This definition fails long before we get beyond 127-th code point: float('infinity')

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:19 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. You can see the Unicode Consortium's stability policy at http://unicode.org/policies/stability_policy.html From the link above: As more experience is gathered in implementing the characters, adjustments in the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
float('١٢٣٤.٥٦') 1234.56 Even if this is somehow an accident or something that someone snuck in, I think it a good idea that *users* be able to input amounts with their native digits. That is different from requiring *programmers* to write literals with euro-ascii-digits So one question

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
What makes it worse, is that while superficially, Unicode versions follow the same X.Y.Z format as Python versions, the stability promises are completely different. For example, it appears that the general category for the ZERO WIDTH SPACE was changed in Unicode 4.0.1. I don't think a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Benjamin Peterson
2010/11/28 M.-A. Lemburg m...@egenix.com: Martin v. Löwis wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed decimal or

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Michael Foord
On 28/11/2010 23:33, Martin v. Löwis wrote: float('١٢٣٤.٥٦') 1234.56 Even if this is somehow an accident or something that someone snuck in, I think it a good idea that *users* be able to input amounts with their native digits. That is different from requiring *programmers* to write literals

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:03 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. No no no. Addition of Unicode identifiers has a well-designed, deliberate specification, with a PEP and all. The support for non-ASCII digits in float appears to be ad-hoc, and not founded on actual needs of actual

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
FWIW the C# equivalent is locale aware *unless* you pass in a specific culture. (System.Double.Parse): That's not quite the equivalent of float(), I would say: this one apparently is locale-aware, so it is more the equivalent of locale.atof. The next question then is if it supports

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Antoine Pitrou
On Sun, 28 Nov 2010 17:23:01 -0600 Benjamin Peterson benja...@python.org wrote: 2010/11/28 M.-A. Lemburg m...@egenix.com: Martin v. Löwis wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
Am 29.11.2010 00:56, schrieb Alexander Belopolsky: On Sun, Nov 28, 2010 at 6:03 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. No no no. Addition of Unicode identifiers has a well-designed, deliberate specification, with a PEP and all. The support for non-ASCII digits in float appears to be

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:59 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. The next question then is if it supports indo-arabic digits in any locale (or more specifically in an arabic locale). And once you answered that question, does it support Devanagari or Bengali digits? And if so, an

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou solip...@pitrou.net wrote: .. That's different. Python doesn't assign any semantic meaning to the characters in identifiers. The non-latin support for numerals, though, could change the meaning of a program dramatically and needs to be

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Michael Foord
On 28/11/2010 23:59, Martin v. Löwis wrote: FWIW the C# equivalent is locale aware *unless* you pass in a specific culture. (System.Double.Parse): That's not quite the equivalent of float(), I would say: this one apparently is locale-aware, so it is more the equivalent of locale.atof. Right.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Michael Foord
On 29/11/2010 00:04, Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 6:59 PM, Martin v. Löwismar...@v.loewis.de wrote: .. The next question then is if it supports indo-arabic digits in any locale (or more specifically in an arabic locale). And once you answered that question, does it

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Ben Finney
Alexander Belopolsky alexander.belopol...@gmail.com writes: On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou solip...@pitrou.net wrote: Perhaps int(), float(), Decimal() and friends could take an optional parameter indicating whether non-ascii digits are considered. It would then satisfy

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Steven D'Aprano
Martin v. Löwis wrote: float('١٢٣٤.٥٦') 1234.56 I think it's a bug that this works. The definition of the float builtin says [...] I think that's a documentation bug rather than a coding bug. If Python wishes to limit the digits allowed in numeric *literals* to ASCII 0...9, that's one

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney ben+pyt...@benfinney.id.au wrote: .. Of course it is fun that Python can process Bengali numerals, but so would be allowing Roman numerals. There is a reason why after careful consideration, PEP 313 was ultimately rejected. Rejecting a proposed

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Ben Finney
Steven D'Aprano st...@pearwood.info writes: If Python wishes to limit the digits allowed in numeric *literals* to ASCII 0...9, that's one thing, but I think that the digits allowed in numeric *strings* should allow the full range of digits supported by the Unicode standard. I assume you

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Steven D'Aprano
Alexander Belopolsky wrote: Two recently reported issues brought into light the fact that Python language definition is closely tied to character properties maintained by the Unicode Consortium. [1,2] For example, when Python switches to Unicode 6.0.0 (planned for the upcoming 3.2 release), we

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano st...@pearwood.info wrote: .. is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII. Seems like a pretty foolish assumption, if you ask me, pretty much akin to

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Stephen J. Turnbull
M.-A. Lemburg writes: It is not uncommon for Asians and other non-Latin script users to use their own native script symbols for numbers. Japanese don't, in computational or scientific work where float() would be used. Japanese numerals are used for dates and for certain felicitous ages (and

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Nick Coghlan
On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull step...@xemacs.org wrote: I agree that Python should make it easy for the programmer to get numerical values of native numeric strings, but it's not at all clear to me that there is any point to having float() recognize them by default.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
Perhaps int(), float(), Decimal() and friends could take an optional parameter indicating whether non-ascii digits are considered. It would then satisfy all parties. Not really. I still would want to see what the actual requirement is: i.e. do any users actually have the desire to have these

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? It's YAGNI, feature bloat. It gives the illusion of supporting something that actually isn't supported very well (namely, parsing

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Martin v. Löwis
That's mostly irrelevant. This feature exists and someone, somewhere, may be using it. We normally don't remove stuff without deprecation. Sure: it should be deprecated before being removed. Regards, Martin ___ Python-Dev mailing list

<    1   2