On Sat, 3 Jun 2023 at 10:12, David Mertz, Ph.D. wrote:
>
> Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no
> currency symbols, so False answer):
>
> >>> unicode_currency = {chr(c) for c in range(0x) if
> >>> unicodedata.category(chr(c)) == "Sc"}
> >>> wordlist = open('/u
Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no
currency symbols, so False answer):
>>> unicode_currency = {chr(c) for c in range(0x) if
>>> unicodedata.category(chr(c)) == "Sc"}
>>> wordlist = open('/usr/local/share/sowpods').read()
>>> len(wordlist)
2707021
>>> %timeit
On Sat, 3 Jun 2023 at 09:42, David Mertz, Ph.D. wrote:
>
> Yeah... oops. Obviously I typed the version in email. Should have done it in
> the shell. But you got the intention of set-ifying the characters in the
> large string.
Yep. I thought of that as I was originally writing, but absent
bench
Yeah... oops. Obviously I typed the version in email. Should have done it
in the shell. But you got the intention of set-ifying the characters in the
large string.
Yes on lies, damn lies, and benchmarks.
On Fri, Jun 2, 2023, 7:29 PM Chris Angelico wrote:
> On Sat, 3 Jun 2023 at 08:28, David Mer
On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. wrote:
>
> This is just bar talk at this point. I think we've shown that this is
> easy enough to do that programmers can roll their own.
>
> But as idle chat goes, note that in your code:
>
>set(unicodedata.category(ch) for ch in s)
>
> If `s`
This is just bar talk at this point. I think we've shown that this is
easy enough to do that programmers can roll their own.
But as idle chat goes, note that in your code:
set(unicodedata.category(ch) for ch in s)
If `s` is a billion characters long, then we make a billion calls to
the `.cat
On Sat, 3 Jun 2023 at 07:28, David Mertz, Ph.D. wrote:
>
> Sure. That's fine. With a sufficiently long strings my code is faster, but
> for "typical" strings yours will be.
Really? How? Your code has to build a set of every character in the
string; mine builds a set of every category in the stri
Sure. That's fine. With a sufficiently long strings my code is faster, but
for "typical" strings yours will be.
On Fri, Jun 2, 2023, 5:20 PM Chris Angelico wrote:
> On Sat, 3 Jun 2023 at 07:08, David Mertz, Ph.D.
> wrote:
> >
> > def does_string_have_currency_mark(s):
> > return bool(set(s)
On Sat, 3 Jun 2023 at 07:08, David Mertz, Ph.D. wrote:
>
> def does_string_have_currency_mark(s):
> return bool(set(s) & set(unicode_categories['Sc'])
>
> def does_string_have_numeric_digit(s): ...
>
> ... and so on. Those seem like questions one asks often enough. Not
> every day, but more t
def does_string_have_currency_mark(s):
return bool(set(s) & set(unicode_categories['Sc'])
def does_string_have_numeric_digit(s): ...
... and so on. Those seem like questions one asks often enough. Not
every day, but more than never.
On Fri, Jun 2, 2023 at 4:59 PM Chris Angelico wrote:
>
>
On Sat, 3 Jun 2023 at 06:54, David Mertz, Ph.D. wrote:
>
> If we're talking PyPI, it would be nice to have:
>
> unicode_categories = {"Zs": [...], "Ll": [...], ...}
>
> For all the various categories. It would just take one pass through
> all the characters to generate it, but then every category
If we're talking PyPI, it would be nice to have:
unicode_categories = {"Zs": [...], "Ll": [...], ...}
For all the various categories. It would just take one pass through
all the characters to generate it, but then every category would be
fast to access later. On the other hand, it's a few lines
On 01.06.2023 20:06, David Mertz, Ph.D. wrote:
I guess this is pretty general for the described need:
%time unicode_whitespace = [chr(c) for c in range(0x) if unicodedata.category(chr(c))
== "Zs"]
Use sys.maxunicode instead of 0x
CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms
W
> On 1 Jun 2023, at 19:10, David Mertz, Ph.D. wrote:
>
> %time unicode_whitespace = [chr(c) for c in range(0x) if
> unicodedata.category(chr(c)) == "Zs"]
Try 0x10 to get all of unicode.
Barry
___
Python-ideas mailing list -- python-ideas@
14 matches
Mail list logo