At 02:11 PM 7/7/02 +0700, Paul Hastings wrote: >is there a standard test that can determine whether a given >database can handle utf-8 (ie as "native" utf-8 not converting >to ucs-2 or whatever)?
Why is that of any interest? The primary concern is whether a database is able to represent the entire repertoire of Unicode. Just create a string that contains the largest character 0x10FFFD, convert it to whatever encoding form the APIs require and see whether you get it back unmolested. A more sophisticated test would take a longer string and attempt to sniff out incorrect truncation of characters. A secondary concern is performance. If the choice of encoding form is a poor match for the actual data encountered, and if entering and retrieving the data requires too many transcoding steps, it's conceivable that this could be detected in the overall performance of the database. However, there's no reason to assume that a theoretical match in encoding efficiency translates automatically into a more efficient database implementation. Therefore, regular benchmarking tools should be fine to determine database performance, as long as the test data is representative for the installation. A./