[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sun, Nov 28, 2010 at 2:40 PM, Marc-Andre Lemburg wrote: .. > Going back further shows the change: > > 3.0.1: 200B;ZERO WIDTH SPACE;Zs;0;BN;N; > 3.2.0: 200B;ZERO WIDTH SPACE;Zs;0;BN;N; > 4.0.1: 200B;ZERO WIDTH SPACE;Cf;0;BN;N; Y

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Going back further shows the change: 3.0.1: 200B;ZERO WIDTH SPACE;Zs;0;BN;N; 3.2.0: 200B;ZERO WIDTH SPACE;Zs;0;BN;N; 4.0.1: 200B;ZERO WIDTH SPACE;Cf;0;BN;N; 4.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;N; 5.1.0: 200B;ZERO WIDTH SPACE

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: It is still strange that the .isspace() property value changed, since the code point has not changed in the recent Unicode versions: 4.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;N; 5.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;N; 5.2.0: 200B;ZERO WIDTH SPACE

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Martin v . Löwis
Martin v. Löwis added the comment: >> In 2.6, there was a manually maintained list, probably dating back to before >> Unicode 4.0. > > That's not quite correct: Python 1.6.x - 2.5.x used tables for the > PyUnicode_ISSPACE() function that were created from the Unicode database. That used to b

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sun, Nov 28, 2010 at 2:07 PM, Marc-Andre Lemburg wrote: .. > The tables were never manually maintained, but we also did not update > Python for each new Unicode version: > > Python 1.6: Unicode 3.0 > Python 2.0: Unicode 3.0 > Python 2.1: Unicode 3.0 >

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Martin v . Löwis
Martin v. Löwis added the comment: > I'm not quoting anything. Thank you very much. Oops, sorry - I confused you with the OP. -- ___ Python tracker ___

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread SilentGhost
Changes by SilentGhost : -- nosy: -SilentGhost ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > > In 2.6, there was a manually maintained list, probably dating back to before > Unicode 4.0. That's not quite correct: Python 1.6.x - 2.5.x used tables for the PyUnicode_ISSPACE() functio

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread SilentGhost
SilentGhost added the comment: I'm not quoting anything. Thank you very much. -- ___ Python tracker ___ ___ Python-bugs-list mailing

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Martin v . Löwis
Martin v. Löwis added the comment: > It's not just this character. isspace() is also False for \u200c and \u200d > (from the same category). and \u2060, \u2800 and \ufeff What reason do you have to believe that they should be classified as whitespace, other than the web page you are quoting (w

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread SilentGhost
SilentGhost added the comment: It's not just this character. isspace() is also False for \u200c and \u200d (from the same category). and \u2060, \u2800 and \ufeff -- ___ Python tracker ___

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Martin v . Löwis
Martin v. Löwis added the comment: In 2.6, there was a manually maintained list, probably dating back to before Unicode 4.0. Python uses the following criterion for determining white space characters: /* Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: The category of U-200B was changed in Unicode 4.0.1: """ The main new features in Unicode 4.0.1 are the following: ... * Changed: general category of U+200B ZERO WIDTH SPACE """ http://unicode.org/versions/Unicode4.0.1/ -- resolution: -> inval

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- nosy: +belopolsky ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread SilentGhost
Changes by SilentGhost : -- versions: +Python 3.2 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.p

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread SilentGhost
SilentGhost added the comment: It returns False on the latest py3k checkout as well. -- nosy: +SilentGhost ___ Python tracker ___ ___

[issue10567] Unicode space character \u200b unrecognised a space

2010-11-28 Thread pbnan
New submission from pbnan : Python: Python 2.7 (r27:82500, Oct 20 2010, 03:21:03) [GCC 4.5.1] on linux2 Code: >>> c = u'\u200b' >>> c.isspace() False In both 2.6, 3.1 it works. http://www.cs.tut.fi/~jkorpela/chars/spaces.html -- components: Unicode messages: 122690 nosy: pbnan priori