Re: [sqlite] International Language Support

Austin Ziegler Thu, 04 Aug 2005 14:12:01 -0700

On 8/4/05, Dan Wellisch <[EMAIL PROTECTED]> wrote:
> Are you saying that 8859-1 encoding does not work with these
> international versions of MS Windows, so we would need to ensure
> that we are putting UTF-8 chars in the data? This would make sense
> if the OS uses UTF-8 chars. in the WHERE clause so that it is
> searching against 8859-1 chars.


What I'm saying is that unless you're explicitly doing something to
ensure that your input is 8859-1, you're getting something else. In
fact, on English Windows, you're probably not getting 8859-1,
either, you're getting Windows 1252 (I *think* that's the right
number), which is similar to, but not *quite* the same as 8859-1.

If you're just using char* (or std::string) and getting input from
Windows, then you're getting it in ANSI/OEM, most of the time, which
is most decidedly *not* 8859-1. It might be Windows 1252, but it's
not necessarily Windows 1252 on non-English versions of Windows.

If you're compiling with UNICODE and are using TCHAR*, you'll be
getting wchar* (or std::wstring), which is actually UCS-2 (related
in some way to UTF-16, but again not *quite* the same since UCS-2
doesn't support surrogates). This is better than ANSI/OEM, but not
by much because it causes other problems.

The real trick is that UTF-8 (which SQLite uses internally; I
haven't quite understood this part in the documentation, and haven't
yet needed to, because I don't think it does any auto-conversion for
you) and US ASCII (that is to say, the first 128 characters of ANSI)
map perfectly.

If it's something where you can test this, you might be able to
reproduce it by trying to use a high-bit character (e.g., c-cedilla
or something) in the English version to see if it searches right.

-austin
-- 
Austin Ziegler * [EMAIL PROTECTED]
               * Alternate: [EMAIL PROTECTED]

Re: [sqlite] International Language Support

Reply via email to