John Crenshaw <johncrens...@priacta.com> wrote: > No, I mean which encoding. You can't give a UTF-16 string to an API > that only knows how to handle UCS-2 encoded data
Well, most of the time, you can. Only in rare cases do you need to treat surrogate pairs in special way. One such case, relevant to this discussion, is converting UTF-16 to UTF-8, which WideCharToMultiByte handles just fine and which you can easily verify experimentally: wchar_t input[] = {0xD800, 0xDC00, 0}; char output[10]; WideCharToMultiByte(CP_UTF8, 0, ws, 3, output, 10, NULL, NULL); The string in output is a single 4-byte UTF-8 sequence, as it should be. A naive convertion of each individual surrogate would have produced two 3-byte sequences. So, from where I sit, Win32 API cheerfully accepts UTF-16. Can you show an example to the contrary? > just like you can't > use a UTF-8 string when ASCII data is expected. Here too, in many cases you can. Good old strcpy and strlen and strchr and strstr work just fine on UTF-8 strings, even though not a line of code in their implementation has changed since before Unicode even existed. In fact, UTF-8 was designed, among other things, to maximize compatibility with older software that predated it. > When I tackle this > nightmare the last time I was left with the understanding that the > wide Win32 APIs expected data to be UCS-2 encoded. Now I'm no longer > sure, and I can't find any reliable documentation on this either way. > It would be good if the APIs accept UTF-16 Which API calls specifically are you concerned about? There are very few cases where the presence of surrogate pairs makes a difference. I believe you are blowing the issue way out of proportion. Igor Tandetnik _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users