Welcome to the wonderful world of multibyte encodings, and Unicode in
particular.

Unless you're looking for an ASCII character, you're looking for a
substring, not a character. And you're really looking for what's called a
codepoint (The entire concept of character gets kind of fuzzy with
Unicode). If you're not careful, looking for 'a' (U+0061 LATIN SMALL LETTER
A) will match the start of á, which is actually a two codepoint grapheme
(U+0061 and U+0301 COMBINING ACUTE ACCENT) that renders as a single entity.
And if you're okay with matching that, what about á (U+00E1 LATIN SMALL
LETTER A WITH ACUTE), the single codepoint composed version?

Unicode is hard. There are libraries like ICU and libunistring which help a
bit. I have a bunch of sqlite extensions at
https://github.com/shawnw/useful_sqlite_extensions (That I really need to
polish up for an actual release) including a string library that expands a
lot on the build in ICU extension to make working with graphemes and
unicode in general in sqlite a lot easier.


On Fri, Apr 12, 2019 at 7:51 AM x <tam118...@hotmail.com> wrote:

> I’m still confused by utf strings. For simplicity, suppose I set up an
> sqlite function that takes a single string parameter and I want to scan the
> string to count the number of occurrences of a certain character . If I
> knew the string was made up entirely of ascii chars I’d do this
>
> char *c = &sqlite3_value_text(0)[0];
> int count=0;
> while (*c) if (*c++ == SearchChar) count++;
>
> How do I do the same thing if the string param is a utf-8 or utf-16 string
> and the SearchChar is a Unicode character?
>
> I’m confused by the fact that Unicode characters are not a fixed number of
> bytes so if I do this e.g.
>
> wchar_t *c = (wchar_t*) sqlite3_value_text(0);
>
> does this mean a complete temporary copy of the value of
> sqlite3_value_text(0) has to be constructed by the compiler such that all
> characters of the newly constructed string are fixed width? If so, I’m just
> wanting to check if there’s a way  of avoiding this overhead.
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to