Hello tech@,

My last mail about ksh in vi mode got me thinking about how UTF-8 is implemented in ls. The question marks, while useful for human readers have a big downside.

I've come across a fair amount of malformed file names by all sorts of causes. Be it malware or just human error. When such a malformed character is in an inconvenient place and can't be auto-completed I usually fix this by something like de following:
$ cd "`ls | tail -1`"
ksh: cd: /home/martijn/Muziek/Motörhead/N?? Sleep at All - No such file or directory
$ cd "`/usr/src/bin/ls/ls | tail -1`"
$

My patch maintains the the question marks when stdout is a tty, but returns the original byte otherwise. Afaik the only logical use for the length is when doing formatted output, which is only when printing to a tty.

This doesn't solve the case when ls is run over ssh -t and the content is redirected client-side, but you can't win them all.

Sincerely,

Martijn van Duren
Index: ls.c
===================================================================
RCS file: /cvs/src/bin/ls/ls.c,v
retrieving revision 1.44
diff -u -p -r1.44 ls.c
--- ls.c        1 Dec 2015 18:36:13 -0000       1.44
+++ ls.c        17 Jan 2016 10:57:03 -0000
@@ -94,6 +94,7 @@ int f_type;                   /* add type character for 
 int f_typedir;                 /* add type character for directories */
 
 int rval;
+int istty = 0;
 
 int
 ls_main(int argc, char *argv[])
@@ -110,6 +111,7 @@ ls_main(int argc, char *argv[])
 
        /* Terminal defaults to -Cq, non-terminal defaults to -1. */
        if (isatty(STDOUT_FILENO)) {
+               istty = 1;
                if ((p = getenv("COLUMNS")) != NULL)
                        width = strtonum(p, 1, INT_MAX, NULL);
                if (width == 0 &&
Index: utf8.c
===================================================================
RCS file: /cvs/src/bin/ls/utf8.c,v
retrieving revision 1.1
diff -u -p -r1.1 utf8.c
--- utf8.c      1 Dec 2015 18:36:13 -0000       1.1
+++ utf8.c      17 Jan 2016 10:57:03 -0000
@@ -21,6 +21,8 @@
 #include <stdlib.h>
 #include <wchar.h>
 
+extern int istty;
+
 int
 mbsprint(const char *mbs, int print)
 {
@@ -33,12 +35,12 @@ mbsprint(const char *mbs, int print)
                if ((len = mbtowc(&wc, mbs, MB_CUR_MAX)) == -1) {
                        (void)mbtowc(NULL, NULL, MB_CUR_MAX);
                        if (print)
-                               putchar('?');
+                               putchar(istty ? '?' : *mbs);
                        total_width++;
                        len = 1;
                } else if ((width = wcwidth(wc)) == -1) {
                        if (print)
-                               putchar('?');
+                               putchar(istty ? '?' : *mbs);
                        total_width++;
                } else {
                        if (print)

Reply via email to