> Le 11 f?vr. 2016 ? 23:13, Clemens Ladisch <clemens at ladisch.de> a ?crit :
> 
> As far as I can see, there are five problems:
> - stdin from the console

Convert from the codepage returned by GetConsoleCP() to UTF8.

> - stdin redirected from a file

Personal opinion: I'd like it to treat input as implicitly in UTF8 as today.

> - stdout to the console

Convert from UTF8 to the codepage returned by GetConsoleOutputCP().

> - stdout redirected to a file

Personal opinion: I'd like it to output UTF8.

> - command-line arguments

They're presented to the application code, through the argv[] pointers, as 
system default ANSI code page. Converting from CP_ACP to UTF8 is appropriate.

I'm adding a 6th point:
- make sure that if sqlite3 needs to present a filename to any ...A Windows 
API, that conversion continues to choose between CP_ACP or CP_OEMCP depending 
on the AreFileApisANSI() function. This is the case right now, and nothing 
related to shell.c should change that.

And a 7th point:
- check that when sqlite gets a text string from Windows from a ...A API (an 
error message string for instance), it is considered to be in CP_ACP and 
converted to whatever needed, from CP_ACP (AreFileApisANSI() should not be 
used).

> This would be too much for 3.11.0.

Of course.

About UINT GetConsoleCP() and UINT GetConsoleOutputCP() functions... They're 
present since Windows 2000. I don't know about various WinCE editions. I wasn't 
so sure of since when they're available, so I coded the quick and dirty change 
for tests purpose using hardcoded CP_OEMCP, but it is better to use 
GetConsole(Output)CP() APIs. Indeed, among the codepages to which the console 
can be switched (or defaulted to on various localized editions of Windows), 
some codepages are considered 'OEM', others 'ANSI'. Using CP_OEMCP when the 
console has been set for an ANSI codepage, gives wrong result. And reciprocal 
too.

I'm advocating for using GetConsoleCP() and GetConsoleOutputCP() in order to 
convert the input or the output as needed, instead of being tempted to use 
their Set counterparts (SetConsoleCP(65001) and SetConsoleOutputCP(65001)).  
That would look simpler to use them to turn the console IO to UTF8, but it's a 
bumpy road. Because unless the display font actually supports unicode and UTF8 
encoding, display issues can appear. And using 65001 does not goes back in time 
as far on the Windows timeline. Using the Get... path, the user can change its 
codepage himself through command chcp ..., knowingly.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 
<http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/attachments/20160212/1f0df2ea/attachment.pgp>

Reply via email to