Hi Igor,

Thanks for your advice and guidance.

On 1/9/2011 11:57 PM, Igor Tandetnik wrote:
On 9/1/2011 10:24 AM, Mohit Sindhwani wrote:

On the other hand, the other language that we are storing seems to
require 3 bytes in UTF-8. Given that, it would appear that using UTF-8
would be a better idea since it will store more "efficiently".

If you have lots of Chinese (or Japanese or Korean) text to store, then UTF-16 might be more compact. For these languages, one character takes three bytes in UTF-8 but only two in UTF-16. On the other hand, plain ASCII characters take one byte in UTF-8 but still two bytes in UTF-16. So if you have a mix of the two, the issue gets murky.



I already have a database that has a couple of tables that are in UTF-8
- is there an easy way for me to build a database from this that is UTF-16?

Using sqlite3 command line utility, run .dump command on the old database. Create a new database. Use "PRAGMA encoding" to set it to UTF-16. Run .import command on it using the dump file from the old one.

I tried what you suggested and for our data, we can get savings in the region of 25% - 33% in the case of strings being stored in a language that does require 3bytes/ character. So, given that, we should explore UTF-16 in more detail. However, we also have a lot of text that is only in English - so, it seems that we should go down the path of separating the data in the two languages and use an ATTACH to bring in the other language. That may be best for our needs.

Thanks for the tips so far!

Best Regards,
Mohit.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to