On 2017/07/11 17:13, Ingo Schwarze wrote:
> Hi Stuart,
> 
> Stuart Henderson wrote on Tue, Jul 11, 2017 at 03:52:26PM +0100:
> > On 2017/07/11 16:19, Ingo Schwarze wrote:
> 
> >> This decade feels like a strange point in time for degrading fortune
> >> and calendar files by replacing UTF-8 characters with ASCII
> >> transcriptions.  Maybe such games should call
> >> 
> >>   setlocale(LC_CTYPE, "");
> >>   char *loc = nl_langinfo(CODESET);
> >> 
> >> and replace bytes that are not printable ASCII with question marks
> >> when loc doesn't contain UTF-8?  Not sure.
> 
> > Given that we don't have
> > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html,
> > that seems better to me than either indiscriminately printing UTF-8 to a
> > terminal expecting ASCII, or quietly mangling output.
> > 
> > But then, how far should one go? ls(1) can have the same problem with
> > an incompatible terminal.
> 
> ls(1) already does that:
> 
>    $ cd /usr/src/bin/ls/Test/   # containing test files on my notebook
>    $ ls
>   a??c                                    surr?????????
>   bel?.                                   test.txt
>   b??r                                    test_wctype
>   cr???.                                  test_wctype.c
>   esc?[4munder                            testfile
>   iv??????????????                        tmp.txt
>   long????????????????????????            wt?]0;rogue_title?.
>   np??.                                   xff?.
>   sh?????????????????????
>    $ export LC_CTYPE=en_US.UTF-8
>    $ ls                          
>   [ snip UTF-8-output because that doesn't belong in mail ]
> 
> Admittedly, what ls(1) does in /usr/src/bin/ls/utf8.c is minimally
> more complicated:  It also validates, sanitizes, and columnates
> UTF-8 characters in LC_CTYPE=en_US.UTF-8 mode.  For simpler cases
> like fortune(6) and calendar(6), no validation, sanitation, and
> columnation is needed, so we get away without mbtowc(3) and even
> without isu8cont().  Just isprint(3) is probably enough for those,
> and even that is only needed unless the locale is UTF-8.

Ah, I'm sorry - I see now that my test was bogus.

I started "xterm +u8" from an existing terminal which already had
LC_CTYPE set.

Thanks for the information.

Reply via email to