-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/28/2010 08:58 AM, Drake Wilson wrote: > Quoth "J. Bobby Lopez" <j...@jbldata.com>, on 2010-10-28 11:48:12 -0400: >> Another think that crossed my mind is that maybe I haven't set up the >> database properly to accept UTF8 or UTF16 data, but I figured this was a >> default in SQLite3. > > You have to pick one when you create the database, usually UTF-8. If > you want UTF-16 use « PRAGMA encoding = 'UTF-16' » (or 'UTF-16le' or > 'UTF-16be') when you create the database.
Just to be clear all the SQLite string APIs accept/produce UTF8. There are also some that accept/produce UTF16 and have a 16 suffix for the function name. The underlying encoding of the database has no effect on what happens at the API level - you will always get the same answers. You can however specify the database encoding as an optimisation. For example if you are predominantly using codepoints above 0x800 then UTF8 requires more bytes to encode the string than UTF16 (3 or more per codepoint versus 2). Choosing a UTF16 encoding in this example could potentially save you 33% of the text storage in the file. Another optimisation may be that you have a user defined function or a collation that is significantly more efficient on UTF16 than UTF8. Counting the number of codepoints is one example. When you register the udf/collation with SQLite you can specify which encodings it can work with. SQLite will always make the conversions before calling the udf/collation. For example if you register the udf/collation to only handle UTF16 then SQLite will automatically convert any bytes it is storing behind the scenes in UTF8 into UTF16 before calling. If you use the udf/collations a lot then it would be more efficient to store the database in UTF16 format so you don't have these conversions going on behind the scenes. TL/DR: The encoding of the database is irrelevant for what you see as a SQLite API user. You will always get the same answers no matter which combinations of APIs and database encoding is used. It may be beneficial to explicitly set the encoding as a space or cpu optimization, but this is *very* unlikely to be the space/cpu issue with your application. Yes, I know about surrogate pairs and no I won't mention how they could complicate matters. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkzJ3kkACgkQmOOfHg372QQnLgCfRYT8tDSi4HjJgPEVyAet3O4I LI4An0Z7ovkEfb2xPK+clpXF/2hjCa/K =fTye -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users