Hi,

Stuart Henderson wrote on Sun, Jan 17, 2016 at 07:46:23PM +0000:
> On 2016/01/17 14:29, Ted Unangst wrote:
>> Ingo Schwarze wrote:

>>> The old ls(1) also weeded out non-printable bytes, in particular
>>> control codes.

>> The old ls only had this behavior for terminals however.
>> Redirecting to a file or pipe would always output the original bytes.

> I've used this a few times in the past, for example "ls | hexdump -C"
> or .."| vis", to find out what the characters used in some filename are.
> I'd find it surprising for this to not work.

Oops.  What we currently have in the tree is broken in that respect,
i broke it, including the -q option.

Current behaviour is:

 * SMALL: fully works, but no UTF-8 support
 * not SMALL:
    - LC_CTYPE=C on a tty or with -q: does '?', ok
    - LC_CTYPE=en_US.UTF-8 on a tty or with -q: does '?', ok
    - LC_CTYPE=C neither tty nor -q: does '?', wrong
    - LC_CTYPE=en_US.UTF-8 neither tty nor -q: does '?', wrong

The following patch fixes the last two cases.
It is similar in spirit to what Martijn originally sent,
but fixes two issues with his patch:

 1) Do not invent a new global variable, use the existing f_nonprint.
 2) For valid, but non-printable codepoints, print all bytes of the
    codepoint's encoding rather than just the first byte.

Should i commit this?

Yours,
  Ingo

Reply via email to