John Crenshaw <johncrens...@priacta.com>
wrote: 
> No, I mean which encoding. You can't give a UTF-16 string to an API
> that only knows how to handle UCS-2 encoded data

Well, most of the time, you can. Only in rare cases do you need to treat 
surrogate pairs in special way. One such case, relevant to this discussion, is 
converting UTF-16 to UTF-8, which WideCharToMultiByte handles just fine and 
which you can easily verify experimentally:

wchar_t input[] = {0xD800, 0xDC00, 0};
char output[10];
WideCharToMultiByte(CP_UTF8, 0, ws, 3, output, 10, NULL, NULL);

The string in output is a single 4-byte UTF-8 sequence, as it should be. A 
naive convertion of each individual surrogate would have produced two 3-byte 
sequences.

So, from where I sit, Win32 API cheerfully accepts UTF-16. Can you show an 
example to the contrary?

> just like you can't
> use a UTF-8 string when ASCII data is expected.

Here too, in many cases you can. Good old strcpy and strlen and strchr and 
strstr work just fine on UTF-8 strings, even though not a line of code in their 
implementation has changed since before Unicode even existed. In fact, UTF-8 
was designed, among other things, to maximize compatibility with older software 
that predated it.

> When I tackle this
> nightmare the last time I was left with the understanding that the
> wide Win32 APIs expected data to be UCS-2 encoded. Now I'm no longer
> sure, and I can't find any reliable documentation on this either way.
> It would be good if the APIs accept UTF-16

Which API calls specifically are you concerned about? There are very few cases 
where the presence of surrogate pairs makes a difference. I believe you are 
blowing the issue way out of proportion.

Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to