[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 10:12, David Mertz, Ph.D. wrote: > > Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no > currency symbols, so False answer): > > >>> unicode_currency = {chr(c) for c in range(0x) if > >>> unicodedata.category(chr(c)) == "Sc"} > >>> wordlist = open('/u

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no currency symbols, so False answer): >>> unicode_currency = {chr(c) for c in range(0x) if >>> unicodedata.category(chr(c)) == "Sc"} >>> wordlist = open('/usr/local/share/sowpods').read() >>> len(wordlist) 2707021 >>> %timeit

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 09:42, David Mertz, Ph.D. wrote: > > Yeah... oops. Obviously I typed the version in email. Should have done it in > the shell. But you got the intention of set-ifying the characters in the > large string. Yep. I thought of that as I was originally writing, but absent bench

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
Yeah... oops. Obviously I typed the version in email. Should have done it in the shell. But you got the intention of set-ifying the characters in the large string. Yes on lies, damn lies, and benchmarks. On Fri, Jun 2, 2023, 7:29 PM Chris Angelico wrote: > On Sat, 3 Jun 2023 at 08:28, David Mer

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. wrote: > > This is just bar talk at this point. I think we've shown that this is > easy enough to do that programmers can roll their own. > > But as idle chat goes, note that in your code: > >set(unicodedata.category(ch) for ch in s) > > If `s`

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
This is just bar talk at this point. I think we've shown that this is easy enough to do that programmers can roll their own. But as idle chat goes, note that in your code: set(unicodedata.category(ch) for ch in s) If `s` is a billion characters long, then we make a billion calls to the `.cat

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 07:28, David Mertz, Ph.D. wrote: > > Sure. That's fine. With a sufficiently long strings my code is faster, but > for "typical" strings yours will be. Really? How? Your code has to build a set of every character in the string; mine builds a set of every category in the stri

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
Sure. That's fine. With a sufficiently long strings my code is faster, but for "typical" strings yours will be. On Fri, Jun 2, 2023, 5:20 PM Chris Angelico wrote: > On Sat, 3 Jun 2023 at 07:08, David Mertz, Ph.D. > wrote: > > > > def does_string_have_currency_mark(s): > > return bool(set(s)

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 07:08, David Mertz, Ph.D. wrote: > > def does_string_have_currency_mark(s): > return bool(set(s) & set(unicode_categories['Sc']) > > def does_string_have_numeric_digit(s): ... > > ... and so on. Those seem like questions one asks often enough. Not > every day, but more t

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
def does_string_have_currency_mark(s): return bool(set(s) & set(unicode_categories['Sc']) def does_string_have_numeric_digit(s): ... ... and so on. Those seem like questions one asks often enough. Not every day, but more than never. On Fri, Jun 2, 2023 at 4:59 PM Chris Angelico wrote: > >

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Chris Angelico
On Sat, 3 Jun 2023 at 06:54, David Mertz, Ph.D. wrote: > > If we're talking PyPI, it would be nice to have: > > unicode_categories = {"Zs": [...], "Ll": [...], ...} > > For all the various categories. It would just take one pass through > all the characters to generate it, but then every category

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread David Mertz, Ph.D.
If we're talking PyPI, it would be nice to have: unicode_categories = {"Zs": [...], "Ll": [...], ...} For all the various categories. It would just take one pass through all the characters to generate it, but then every category would be fast to access later. On the other hand, it's a few lines

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Marc-Andre Lemburg
On 01.06.2023 20:06, David Mertz, Ph.D. wrote: I guess this is pretty general for the described need: %time unicode_whitespace = [chr(c) for c in range(0x) if unicodedata.category(chr(c)) == "Zs"] Use sys.maxunicode instead of 0x CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms W

[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Barry
> On 1 Jun 2023, at 19:10, David Mertz, Ph.D. wrote: > > %time unicode_whitespace = [chr(c) for c in range(0x) if > unicodedata.category(chr(c)) == "Zs"] Try 0x10 to get all of unicode. Barry ___ Python-ideas mailing list -- python-ideas@