[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Ethan Furman
On 6/1/23 11:06, David Mertz, Ph.D. wrote: > I guess this is pretty general for the described need: > > >>> unicode_whitespace = [chr(c) for c in range(0x) if unicodedata.category(chr(c)) == "Zs"] Using the module-level `__getattr__` that could be a lazy attribute. -- ~Ethan~ _

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Paul Moore
On Thu, 1 Jun 2023 at 18:16, David Mertz, Ph.D. wrote: > OK, fair enough. What about "has whitespace (including Unicode beyond > ASCII)"? > >>> import re >>> r = re.compile(r'\s', re.U) >>> r.search('ab\u2002cd') ❯ py -m timeit -s "import re; r = re.compile(r'\s', re.U)" "r.search('ab\u2002cd'

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Richard Damon
On 6/1/23 2:06 PM, David Mertz, Ph.D. wrote: I'm not sure why U+FEFF isn't included, but that seems to match the current standards, so all good. I think because Zero Width, No-Breaking Space, (aka BOM Mark) doesn't act like a "Space" character. If used as the BOM mark, it is intended that it

[Python-ideas] Re: extend method of the list class could return a reference to the list so that we can chain method calls

2023-06-01 Thread Tal Einat
On Mon, May 29, 2023 at 2:52 AM Richard Damon wrote: > > On 5/28/23 7:32 PM, Samuel Muldoon wrote: > > *Currently, list.extend does not allow method chaining.* > > * > > * > > *parameters = [ > > "zero or more", > > "zero or more".upper(), > > "zero or more".lower(), > > "Zero or M

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread David Mertz, Ph.D.
I guess this is pretty general for the described need: >>> %time unicode_whitespace = [chr(c) for c in range(0x) if >>> unicodedata.category(chr(c)) == "Zs"] CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms Wall time: 18.7 ms >>> unicode_whitespace [' ', '\xa0', '\u1680', '\u2000', '\u2001'

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Marc-Andre Lemburg
On 01.06.2023 18:18, Paul Moore wrote: On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio mailto:antonio...@gmail.com>> wrote: I suggest including a simple str variable in unicodedata module to mirror string.whitespace, so it would contain all characters defined in CPython f

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread David Mertz, Ph.D.
OK, fair enough. What about "has whitespace (including Unicode beyond ASCII)"? On Thu, Jun 1, 2023 at 1:08 PM Chris Angelico wrote: > > On Fri, 2 Jun 2023 at 02:27, David Mertz, Ph.D. wrote: > > > > It feels to me like "split on whitespace" or "remove whitespace" are > > quite common operations.

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Chris Angelico
On Fri, 2 Jun 2023 at 02:27, David Mertz, Ph.D. wrote: > > It feels to me like "split on whitespace" or "remove whitespace" are > quite common operations. I've been frustrated a number of times by > settling for the ASCII whitespace class when I really wanted the > Unicode whitespace class. > Th

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread David Mertz, Ph.D.
It feels to me like "split on whitespace" or "remove whitespace" are quite common operations. I've been frustrated a number of times by settling for the ASCII whitespace class when I really wanted the Unicode whitespace class. On Thu, Jun 1, 2023 at 12:20 PM Paul Moore wrote: > > On Thu, 1 Jun 2

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Paul Moore
On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio < antonio...@gmail.com> wrote: > I suggest including a simple str variable in unicodedata module to mirror > string.whitespace, so it would contain all characters defined in CPython > function [_PyUnicode_IsWhitespace()]( > https://github.

[Python-ideas] Add a .whitespace property to module unicodedata

2023-06-01 Thread Antonio Carlos Jorge Patricio
I suggest including a simple str variable in unicodedata module to mirror string.whitespace, so it would contain all characters defined in CPython function [_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314) so that: # existent string.whites

[Python-ideas] Add a .whitespace property to module unicodedata

2023-06-01 Thread Antonio Carlos Jorge Patricio
I suggest including a simple str variable in unicodedata module to mirror string.whitespace, so it would contain all characters defined in CPython function [_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314) so that: # existent string.whites