On 2017/07/11 17:13, Ingo Schwarze wrote: > Hi Stuart, > > Stuart Henderson wrote on Tue, Jul 11, 2017 at 03:52:26PM +0100: > > On 2017/07/11 16:19, Ingo Schwarze wrote: > > >> This decade feels like a strange point in time for degrading fortune > >> and calendar files by replacing UTF-8 characters with ASCII > >> transcriptions. Maybe such games should call > >> > >> setlocale(LC_CTYPE, ""); > >> char *loc = nl_langinfo(CODESET); > >> > >> and replace bytes that are not printable ASCII with question marks > >> when loc doesn't contain UTF-8? Not sure. > > > Given that we don't have > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html, > > that seems better to me than either indiscriminately printing UTF-8 to a > > terminal expecting ASCII, or quietly mangling output. > > > > But then, how far should one go? ls(1) can have the same problem with > > an incompatible terminal. > > ls(1) already does that: > > $ cd /usr/src/bin/ls/Test/ # containing test files on my notebook > $ ls > a??c surr????????? > bel?. test.txt > b??r test_wctype > cr???. test_wctype.c > esc?[4munder testfile > iv?????????????? tmp.txt > long???????????????????????? wt?]0;rogue_title?. > np??. xff?. > sh????????????????????? > $ export LC_CTYPE=en_US.UTF-8 > $ ls > [ snip UTF-8-output because that doesn't belong in mail ] > > Admittedly, what ls(1) does in /usr/src/bin/ls/utf8.c is minimally > more complicated: It also validates, sanitizes, and columnates > UTF-8 characters in LC_CTYPE=en_US.UTF-8 mode. For simpler cases > like fortune(6) and calendar(6), no validation, sanitation, and > columnation is needed, so we get away without mbtowc(3) and even > without isu8cont(). Just isprint(3) is probably enough for those, > and even that is only needed unless the locale is UTF-8.
Ah, I'm sorry - I see now that my test was bogus. I started "xterm +u8" from an existing terminal which already had LC_CTYPE set. Thanks for the information.