> On Friday, 24 June, 2016 12:17 -0600, Scott Robison said:

> Okay, rather than guessing, I just did a test from a Windows 10 command
> prompt. I am getting appropriate UTF-8 sequences. Here is my experiment:
> 
> I opened a memory database and issued the following commands:
> 
> create table test(a text);
> insert into test values('ß'),('▀'),('á'),('ß'); -- for the first value I
> typed ALT+225, then ALT+223, then ALT+0225, then ALT+0223
> select a, hex(a) from test;
> 
> Which resulted in four rows of output:
> 
> ß|C3A1
> ▀|C39F
> á|C2A0
> ß|C3A1

And I get this, on Windows 10 1511, Consolas font, codepage 437 (IBM ANSI 
codepage with drawing characters in upper 127 characters):

sqlite> create table test(x text);
sqlite> insert into test values ('ß'); -- ALT+223
sqlite> insert into test values ('á'); -- ALT+225
sqlite> insert into test values ('ß'); -- ALT+0223
sqlite> insert into test values ('á'); -- ALT+0225
sqlite> select x, hex(x) from test;
ß|C3A1
á|C2A0
ß|C3A1
á|C2A0

Changing the codepage to 1252 (Windows ACP for Western European languages) I 
get this:

sqlite> create table test(x text);
sqlite> insert into test values ('ß');
sqlite> insert into test values ('á');
sqlite> insert into test values ('ß');
sqlite> insert into test values ('á');
sqlite> select x, hex(x) from test;
ß|C39F
á|C3A1
ß|C39F
á|C3A1

With codepage 65001 sqlite terminates -- many things do not know how to handle 
this codepage internally since it is "new" (Windows 10), just like lots of 
stuff crashes on datetime conversions if you let the locale to "Canada" because 
it is "new" (With Windows 95) and uses a farked up date format.

With codepage 850 (IBM ANSI Multilingual with accented characters in the upper 
127 characters) I get:

sqlite> create table test(x text);
sqlite> insert into test values ('ß');
sqlite> insert into test values ('á');
sqlite> insert into test values ('ß');
sqlite> insert into test values ('á');
sqlite> select x, hex(x) from test;
ß|C3A1
á|C2A0
ß|C3A1
á|C2A0

So as you can see, the automagic translation works correctly if the OEMCP is 
not an OEM codepage but rather a Windows ACP.  This is changed in the registry 
here:

HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage

Value ACP is the DWORD Windows ANSI Code Page (1252 for Western European)
Value OEMCP is the DWORD Windows Console OEM Code Page (437 for the standard 
OEM Code Page with graphics in the top, and 850 for the Western European code 
page with accented characters in the upper 127 characters)

> I'm hoping all these extended characters are handled properly by gmail and
> whatever email program you use.
 
> Windows supports legacy ALT+### codes that map to the legacy code page. It
> also supports ALT+0### which map to Unicode code points. This allows
> people who're accustomed to the ALT+### format to still see the character they
> expect, but translated to the equivalent Unicode code point.

This is claimed, but not quite true.  I don't know what font and codepage you 
have set for the console but they affect both the input and output conversions. 
 Generally, the ACP codepage(s) work correctly, OEM ones do not.  You also have 
to match the font to the codepage or the input translation is based on the 
output translation, resulting in incorrectness.

I suspect you are using an OEM font, and not a unicode font ...




_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to