Regex and capture unicode text

2021-01-21 Thread stbalbach
That's great, thank you, did not know this. Now understand the issue and the possible solution.

Regex and capture unicode text

2021-01-21 Thread treeform
You might want to preform some sort of unicode normalization first, to map unicode "C" to ascii "C" etc... Maybe?

Regex and capture unicode text

2021-01-21 Thread treeform
Use unicode mode in the regex: re"(*UTF)..." Run See:

Regex and capture unicode text

2021-01-21 Thread stbalbach
Working with various languages in Wikipedia and would like to capture text that is Unicode, for example: This works (plain ascii): import re let t = "{{Cite book|test=}}" echo $(findBounds(t, re("(*UTF8)[{]{2}Cite book[|][^}]+}}", {}) )) Run This does not work