On 6/1/23 11:06, David Mertz, Ph.D. wrote:
> I guess this is pretty general for the described need:
>
> >>> unicode_whitespace = [chr(c) for c in range(0x) if unicodedata.category(chr(c))
== "Zs"]
Using the module-level `__getattr__` that could be a lazy attribute.
--
~Ethan~
_
On Thu, 1 Jun 2023 at 18:16, David Mertz, Ph.D.
wrote:
> OK, fair enough. What about "has whitespace (including Unicode beyond
> ASCII)"?
>
>>> import re
>>> r = re.compile(r'\s', re.U)
>>> r.search('ab\u2002cd')
❯ py -m timeit -s "import re; r = re.compile(r'\s', re.U)"
"r.search('ab\u2002cd'
On 6/1/23 2:06 PM, David Mertz, Ph.D. wrote:
I'm not sure why U+FEFF isn't included, but that seems to match the
current standards, so all good.
I think because Zero Width, No-Breaking Space, (aka BOM Mark) doesn't
act like a "Space" character.
If used as the BOM mark, it is intended that it
On Mon, May 29, 2023 at 2:52 AM Richard Damon wrote:
>
> On 5/28/23 7:32 PM, Samuel Muldoon wrote:
> > *Currently, list.extend does not allow method chaining.*
> > *
> > *
> > *parameters = [
> > "zero or more",
> > "zero or more".upper(),
> > "zero or more".lower(),
> > "Zero or M
I guess this is pretty general for the described need:
>>> %time unicode_whitespace = [chr(c) for c in range(0x) if
>>> unicodedata.category(chr(c)) == "Zs"]
CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms
Wall time: 18.7 ms
>>> unicode_whitespace
[' ', '\xa0', '\u1680', '\u2000', '\u2001'
On 01.06.2023 18:18, Paul Moore wrote:
On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio
mailto:antonio...@gmail.com>> wrote:
I suggest including a simple str variable in unicodedata module to
mirror string.whitespace, so it would contain all characters defined
in CPython f
OK, fair enough. What about "has whitespace (including Unicode beyond ASCII)"?
On Thu, Jun 1, 2023 at 1:08 PM Chris Angelico wrote:
>
> On Fri, 2 Jun 2023 at 02:27, David Mertz, Ph.D. wrote:
> >
> > It feels to me like "split on whitespace" or "remove whitespace" are
> > quite common operations.
On Fri, 2 Jun 2023 at 02:27, David Mertz, Ph.D. wrote:
>
> It feels to me like "split on whitespace" or "remove whitespace" are
> quite common operations. I've been frustrated a number of times by
> settling for the ASCII whitespace class when I really wanted the
> Unicode whitespace class.
>
Th
It feels to me like "split on whitespace" or "remove whitespace" are
quite common operations. I've been frustrated a number of times by
settling for the ASCII whitespace class when I really wanted the
Unicode whitespace class.
On Thu, Jun 1, 2023 at 12:20 PM Paul Moore wrote:
>
> On Thu, 1 Jun 2
On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio <
antonio...@gmail.com> wrote:
> I suggest including a simple str variable in unicodedata module to mirror
> string.whitespace, so it would contain all characters defined in CPython
> function [_PyUnicode_IsWhitespace()](
> https://github.
I suggest including a simple str variable in unicodedata module to mirror
string.whitespace, so it would contain all characters defined in CPython
function
[_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314)
so that:
# existent
string.whites
I suggest including a simple str variable in unicodedata module to mirror
string.whitespace, so it would contain all characters defined in CPython
function
[_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314)
so that:
# existent
string.whites
12 matches
Mail list logo