[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-15 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Thus strip and isspace are now unusable methods in Python for common use > cases. Please recognize that you haven't demonstrated this at all. U+200B is *not* a character that is common, not even remotely. It's a rare, infrequent, unused character. In additi

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-15 Thread Ezio Melotti
Ezio Melotti added the comment: > So I guess this brings me back to my original issue. I'm not looking > for particularly advanced stripping. I just want to remove all > whitespace and other non-printing characters. .strip only strips whitespace. Stripping non-printing characters and additi

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-15 Thread Dave Mankoff
Dave Mankoff added the comment: "Use regular expressions for more advanced stripping than what the .strip method provides." So I guess this brings me back to my original issue. I'm not looking for particularly advanced stripping. I just want to remove all whitespace and other non-printing ch

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: I would also object to the feature creep. -- nosy: +rhettinger ___ Python tracker ___ ___ Python

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Martin v . Löwis
Martin v. Löwis added the comment: Making it a feature request would procedurally be ok. However, I'd immediately refuse that as feature creep. Use regular expressions for more advanced stripping than what the .strip method provides. -- ___ Python

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Dave Mankoff
Dave Mankoff added the comment: So I contacted the Unicode Technical Committee about the issue and received a promptly received a response back. They pointed that the ZWSP was, once upon a time considered white space but that was changed in Unicode 4.0.1 http://www.unicode.org/review/resolved

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Martin v . Löwis
Martin v. Löwis added the comment: > But why are they not a space? Because the Unicode standard says they are not. We have a good tradition in Python to follow standards where they apply, and it appears that the Unicode standard is crystal clear that the characters in question are *not* white

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Dave Mankoff
Dave Mankoff added the comment: But why are they not a space? I mean, they literally have the word space in their name and are used as separators between words. I can't really see any reason why you wouldn't want this behavior - there's not time when I would be thankful that strip removed all

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Ezio Melotti
Ezio Melotti added the comment: I think those shouldn't be considered whitespace, so they shouldn't be stripped either. Even if _PyUnicode_IsWhitespace doesn't match exactly the Unicode definition of White_Space, they both agree that ZWSP and ZWNBSP are not whitespace. ZWNBSP is also used as

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-14 Thread Dave Mankoff
Dave Mankoff added the comment: I appreciated the quick turnaround on this. Perhaps I am misunderstanding the resolution. I understand that strip uses _PyUnicode_IsWhitespace, and that _PyUnicode_IsWhitespace "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-12 Thread Ezio Melotti
Ezio Melotti added the comment: str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 othe

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-12 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- versions: +Python 3.3 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

2011-11-12 Thread Dave Mankoff
New submission from Dave Mankoff : Title pretty much says it all. Simple test case: >>> len(u' \t\r\n\u200B'.strip()) 1 Should be zero. Same problem in Python3: >>> len(' \t\r\n\u200B'.strip()) 1 -- components: Unicode messages: 147538 nosy: ezio.melotti, mankyd priority: normal seve