Re: utf-8 and databases

Asmus Freytag Mon, 08 Jul 2002 00:10:49 -0700

At 02:11 PM 7/7/02 +0700, Paul Hastings wrote:
>is there a standard test that can determine whether a given
>database can handle utf-8 (ie as "native" utf-8 not converting
>to ucs-2 or whatever)?


Why is that of any interest?

The primary concern is whether a database is able to represent the entire 
repertoire of Unicode. Just create a string that contains the largest 
character 0x10FFFD, convert it to whatever encoding form the APIs require 
and see whether you get it back unmolested.

A more sophisticated test would take a longer string and attempt to sniff 
out incorrect truncation of characters.

A secondary concern is performance. If the choice of encoding form is a 
poor match for the actual data encountered, and if entering and retrieving 
the data requires too many transcoding steps, it's conceivable that this 
could be detected in the overall performance of the database.

However, there's no reason to assume that a theoretical match in encoding 
efficiency translates automatically into a more efficient database 
implementation.
Therefore, regular benchmarking tools should be fine to determine database 
performance, as long as the test data is representative for the installation.

A./

Re: utf-8 and databases

Reply via email to