Re: [sqlite] Unicode Confusion and Database Size

Mohit Sindhwani Sat, 03 Sep 2011 08:27:36 -0700

Hi Igor,

Thanks for your advice and guidance.


On 1/9/2011 11:57 PM, Igor Tandetnik wrote:

On 9/1/2011 10:24 AM, Mohit Sindhwani wrote:
On the other hand, the other language that we are storing seems to
require 3 bytes in UTF-8. Given that, it would appear that using UTF-8
would be a better idea since it will store more "efficiently".
If you have lots of Chinese (or Japanese or Korean) text to store,then UTF-16 might be more compact. For these languages, one charactertakes three bytes in UTF-8 but only two in UTF-16. On the other hand,plain ASCII characters take one byte in UTF-8 but still two bytes inUTF-16. So if you have a mix of the two, the issue gets murky.

I already have a database that has a couple of tables that are in UTF-8
- is there an easy way for me to build a database from this that isUTF-16?
Using sqlite3 command line utility, run .dump command on the olddatabase. Create a new database. Use "PRAGMA encoding" to set it toUTF-16. Run .import command on it using the dump file from the old one.

I tried what you suggested and for our data, we can get savings in theregion of 25% - 33% in the case of strings being stored in a languagethat does require 3bytes/ character. So, given that, we should exploreUTF-16 in more detail. However, we also have a lot of text that is onlyin English - so, it seems that we should go down the path of separatingthe data in the two languages and use an ATTACH to bring in the otherlanguage. That may be best for our needs.


Thanks for the tips so far!

Best Regards,
Mohit.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Unicode Confusion and Database Size

Reply via email to