> On Nov 11, 2019, at 7:49 AM, Jose Isaias Cabrera <jic...@outlook.com> wrote: > > if you want to count characters in languages such as Arabic, Hebrew, Chinese, > Japanese, etc., the easiest way is to convert that string to UTF32, and do a > string count of that UTF32 variable.
No, the easiest way is to ask your string class/library what the character count is, and let _it_ deal with the fiddly details. Or to consider why you need the character count in the first place — it’s usually not something that’s useful to know. Usually what you’re really asking is “how many pixels wide will this render?” or “how many bytes will this occupy?” or even “let me iterate over each character”. At a low level, UTF-8 makes a lot more sense. It’s very compact, which is important for cache coherency as well as storage space. It’s upward compatible with ASCII, which is extremely convenient for text-based protocols / file formats / languages, and for working with legacy APIs (like <string.h>!) Modern libraries seem to be moving to UTF-8. For instance, Apple’s been migrating Swift’s string class from a legacy UTF-16 encoding to UTF-8, and playing up the consequent performance and space win. Go has been UTF-8 from the start. I don’t know of a single library that’s gone with UTF-32, except maybe as an option. —Jens _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users