> Le 20 juin 2017 à 15:24, R Smith <rsm...@rsweb.co.za> a écrit : > > As an aside - I never understood the reasons for that. I get that Windows has > a less "techy" clientèle than Linux for instance, and that the backwards > compatibility is paramount, and that no console command ever need fall > outside the 7-bit ANSI range of characters... but geez, how much effort can > it be to make it Unicode-friendly? It's not like the Windows API lacks any > Unicode functionality - even Notepad can handle it masterfully.
I wouldn't like looking like I'm trolling this subject, but this is only a matter of I/O functions used by programs built to interact with the display and keyboard when run in a console. Windows needs those programs to use ReadConsoleW/WriteConsoleW to do the proper thing. Those programs using C library to read or output byte streams can't do anything equivalent no matter what 'codepage' is set to be used or to/from what DBCS the program attempts conversion to/from. I learned this postulat here last year and have had excellent success with console I/O ever since in my programmings. To be complete, regarding proper display of the output, there is a secondary consideration. The fonts available in Windows are far from covering a large subset of the glyphs. For eastern languages on a western Windows edition, you generally need to change your console settings to make it use another font than the default one, just so that it can draw the needed glyphs. But the basic thing to do is get the program running in the console (here we are talking shell.c - sqlite3.exe) to output Windows wide-chars using the function WriteConsoleW(). And use ReadConsoleW() to read wide-chars chunks from the console input, before converting internally to UTF-8 or whatever wanted. Sqlite3 shell.c when patched that way is as pleasant to use on Windows console as it can be on a modern Linux or macOS. Input files feeded to sqlite3.exe need to be in UTF-8, as well as output sent by sqlite3.exe will be: that part is perfectly OK today in sqlite3.exe. Only the keyboard reading and console output writing lacks a little. > but geez, how much effort can it be to make it Unicode-friendly? To further comment on a more general plane than the sqlite3.exe, the issue is deeper in Windows than in its console. Once upon a time (!), they made the choice of 16 bits per characters encoding as the *right* way (their right way!) to do Unicode. It took time for this to evolve, recognizing the need for multi-16 bits words encoding (UTF-16), so they could have chosen UTF-8 from day one, but that was not what history recorded. Later UTF-8 got *some* support in the OS (through conversion functions). But never UTF-8 was raised to full citizenship. There is even a CHCP 65001 to set the 'codepage' to UTF-8. It works partly in some circumstances, but is far from being 'right'. No matter what you would do, there is no way for any file I/O primitive of the OS to take an UTF-8 string as a filename. And this extend to the C-library on Windows platform. The only unicode support is to pass a UTF-16 filename through functions ending with a W in the name. Those 'ansi' functions, ending with an A in the name, are merely wrappers converting to the wide chars versions. There have been numerous requests to Microsoft to let people and developers set the ANSI codepage to UTF-8 so that the file I/O functions taking a narrow char filename string can interpret it as UTF-8. Some are still waiting for that day to come, others use the W-side of things, complicating portability of their codebase. :) -- Best Regards, Meilleures salutations, Met vriendelijke groeten, Olivier Mascia, http://integral.software _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users