Windows UTF-16 is represented by WCHAR. It is always 2 bytes. UCS-2 can be 3 or more bytes but these are for extended characters outside the ones used for real language. For example, musical notation symbols use the third byte. I don't think any OS's use UCS2 directly. I know Oracle supports UTF8, UTF16, and UCS2. In fact, Oracle's online documentation has a really good discussion of Unicode. Look for their internationalization book. I wrote some code that was sharing data from Oracle to Microsoft SQL Server and found this book very helpful. Oracle generally favors UTF8 while SQL Server favors UTF16.
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96529/toc.htm If you are going to cast to unsigned char*, you must manage the fact that your strings are two (or more) bytes. You are just effectively using a byte pointer to the string data. I think wchar_t* is used typically but the encoding is usually platform dependent. The big problem you have is that your databases are portable. I think you will need to pick and internal format to store the strings in the db so that you can then translate as appropriate for a platform. You may be able to do some clever things like use sizeof(wchar_t) to find out how many bytes are used for a character and use that for your translation. There is a Unicode book available that talks about the specs. Unfortunately, my experience has been that everyone has their own nuiances. Generally though it is pretty consistent. -- Andrew On Wed, 7 Apr 2004, D. Richard Hipp wrote: > Simon Berthiaume wrote: > > >> Notice that text strings are always transferred as type "char*" even > > if the text representation is UTF-16. > > > > This might force users to explicitely type cast some calls to function > > to avoir warnings. I would prefer UNICODE neutral functions that can > > take either one of them depending on the setting of a compilation > > #define (UNICODE). Create a function that takes char * and another that > > takes wchar_t * them encourage the use of a #defined symbol that would > > switch depending on context (see example below). It would allow people > > to call the functions in either way they want. > > > > Example: > > > > int sqlite3_open8(const char*, sqlite3**, const char**); > > int sqlite3_open16(const wchar_t*, sqlite3**, const wchar_t**); > > #ifdef UNICODE > > #define sqlite3_open sqlite3_open16 > > #else > > #define sqlite3_open sqlite3_open8 > > #endif > > > > I'm told that wchar_t is 2 bytes on some systems and 4 bytes on others. > Is it really acceptable to use wchar_t* as a UTF-16 string pointer? > > Note that internally, sqlite3 will cast all UTF-16 strings to be of > type "unsigned char*". So the type in the declaration doesn't really > matter. But it would be nice to avoid compiler warnings. So what datatype > are most systems expecting to use for UTF-16 strings? Who can provide > me with a list? Or even a few examples? > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]