Welcome to the wonderful world of multibyte encodings, and Unicode in particular.
Unless you're looking for an ASCII character, you're looking for a substring, not a character. And you're really looking for what's called a codepoint (The entire concept of character gets kind of fuzzy with Unicode). If you're not careful, looking for 'a' (U+0061 LATIN SMALL LETTER A) will match the start of á, which is actually a two codepoint grapheme (U+0061 and U+0301 COMBINING ACUTE ACCENT) that renders as a single entity. And if you're okay with matching that, what about á (U+00E1 LATIN SMALL LETTER A WITH ACUTE), the single codepoint composed version? Unicode is hard. There are libraries like ICU and libunistring which help a bit. I have a bunch of sqlite extensions at https://github.com/shawnw/useful_sqlite_extensions (That I really need to polish up for an actual release) including a string library that expands a lot on the build in ICU extension to make working with graphemes and unicode in general in sqlite a lot easier. On Fri, Apr 12, 2019 at 7:51 AM x <tam118...@hotmail.com> wrote: > I’m still confused by utf strings. For simplicity, suppose I set up an > sqlite function that takes a single string parameter and I want to scan the > string to count the number of occurrences of a certain character . If I > knew the string was made up entirely of ascii chars I’d do this > > char *c = &sqlite3_value_text(0)[0]; > int count=0; > while (*c) if (*c++ == SearchChar) count++; > > How do I do the same thing if the string param is a utf-8 or utf-16 string > and the SearchChar is a Unicode character? > > I’m confused by the fact that Unicode characters are not a fixed number of > bytes so if I do this e.g. > > wchar_t *c = (wchar_t*) sqlite3_value_text(0); > > does this mean a complete temporary copy of the value of > sqlite3_value_text(0) has to be constructed by the compiler such that all > characters of the newly constructed string are fixed width? If so, I’m just > wanting to check if there’s a way of avoiding this overhead. > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users