subject:"Regex and capture unicode text"

Regex and capture unicode text

2021-01-21 Thread stbalbach

That's great, thank you, did not know this. Now understand the issue and the possible solution.

Regex and capture unicode text

2021-01-21 Thread treeform

You might want to preform some sort of unicode normalization first, to map unicode "C" to ascii "C" etc... Maybe?

Regex and capture unicode text

2021-01-21 Thread treeform

Use unicode mode in the regex: re"(*UTF)..." Run See:

Regex and capture unicode text

2021-01-21 Thread stbalbach

Working with various languages in Wikipedia and would like to capture text that is Unicode, for example: This works (plain ascii): import re let t = "{{Cite book|test=}}" echo $(findBounds(t, re("(*UTF8)[{]{2}Cite book[|][^}]+}}", {}) )) Run This does not work