I installed these POSIX-conformance fixes for unexpand. They're fairly subtle, but basically "unexpand" was converted some blanks that it should have left alone. I added test cases for the problem areas.
2004-08-24 Paul Eggert <[EMAIL PROTECTED]> POSIX-conformance fixes for "expand" and "unexpand". Also, consistently use "tab stop" rather than "tabstop". * NEWS: Document fixes. * src/expand.c: Revamp to resemble the new unexpand.c better. (usage): -i does not convert tabs after non-tabs. (add_tab_stop): Renamed from add_tabstop. All uses changed. (parse_tab_stop): Renamed from parse_tabstop. All uses changed. (validate_tab_stop): Renamed from validate_tabstop. All uses changed. (next_file, main): Check fclose against 0, not EOF. (expand): Remove unnecessary casts. Add another loop nesting level, for lines, so that per-line variables are initialized cleanly. Revamp tab checking. Check for write error immediately, rather than just once at the end of the program. * src/unexpand.c: Lkewise (for the expand.c changes). (TAB_STOP_SENTINEL): Remove. (tab_size): Now size_t, not uintmax_t, since we need to store the sequences of blanks. (max_column_width): New var. (usage): Say "blank" where POSIX requires this. (add_tab_stop): Calculate maximum column width. (unexpand): Store the pending blanks, instead of merely counting them. Follow POSIX's rules about -a requiring two blanks before a tab stop. Get rid of internal label and goto. * tests/unexpand/basic-1: Fix infloop-3 to match POSIX. Add blanks-1 through blanks-13. * doc/coreutils.texi: Standardize on "tab stop" (the POSIX usage) rather than "tabstop". (unexpand invocation): Use "blank" rather than "space" when POSIX requires "blank". Define "blank". Initial blanks are converted even if there's just one. For -a, convert two or more blanks only if they occur just before a tab stop. Index: NEWS =================================================================== RCS file: /home/eggert/coreutils/cu/NEWS,v retrieving revision 1.229 diff -p -u -r1.229 NEWS --- NEWS 19 Aug 2004 20:02:07 -0000 1.229 +++ NEWS 24 Aug 2004 07:27:46 -0000 @@ -85,6 +85,11 @@ GNU coreutils NEWS POSIXLY_CORRECT is set and the first argument is not "-n", echo now outputs all option-like arguments instead of treating them as options. + expand and unexpand now conform to POSIX better. They check for + blanks (which can include characters other than space and tab in + non-POSIX locales) instead of spaces and tabs. Unexpand now + preserves some blanks instead of converting them to tabs or spaces. + printf has several changes: It now uses 'intmax_t' (not 'long int') to format integers, so it Index: src/expand.c =================================================================== RCS file: /home/eggert/coreutils/cu/src/expand.c,v retrieving revision 1.76 diff -p -u -r1.76 expand.c --- src/expand.c 2 Aug 2004 23:49:31 -0000 1.76 +++ src/expand.c 24 Aug 2004 07:12:14 -0000 @@ -24,9 +24,9 @@ --tabs=tab1[,tab2[,...]] -t tab1[,tab2[,...]] -tab1[,tab2[,...]] If only one tab stop is given, set the tabs tab1 - spaces apart instead of the default 8. Otherwise, + columns apart instead of the default 8. Otherwise, set the tabs at columns tab1, tab2, etc. (numbered from - 0); replace any tabs beyond the tabstops given with + 0); replace any tabs beyond the tab stops given with single spaces. --initial -i Only convert initial tabs on each line to spaces. @@ -120,7 +120,7 @@ With no FILE, or when FILE is -, read st Mandatory arguments to long options are mandatory for short options too.\n\ "), stdout); fputs (_("\ - -i, --initial do not convert TABs after non whitespace\n\ + -i, --initial do not convert tabs after non blanks\n\ -t, --tabs=NUMBER have tabs NUMBER characters apart, not 8\n\ "), stdout); fputs (_("\ @@ -136,18 +136,18 @@ Mandatory arguments to long options are /* Add tab stop TABVAL to the end of `tab_list'. */ static void -add_tabstop (uintmax_t tabval) +add_tab_stop (uintmax_t tabval) { if (first_free_tab == n_tabs_allocated) tab_list = x2nrealloc (tab_list, &n_tabs_allocated, sizeof *tab_list); tab_list[first_free_tab++] = tabval; } -/* Add the comma or blank separated list of tabstops STOPS - to the list of tabstops. */ +/* Add the comma or blank separated list of tab stops STOPS + to the list of tab stops. */ static void -parse_tabstops (char const *stops) +parse_tab_stops (char const *stops) { bool have_tabval = false; uintmax_t tabval IF_LINT (= 0); @@ -159,7 +159,7 @@ parse_tabstops (char const *stops) if (*stops == ',' || ISBLANK (to_uchar (*stops))) { if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); have_tabval = false; } else if (ISDIGIT (*stops)) @@ -198,14 +198,14 @@ parse_tabstops (char const *stops) exit (EXIT_FAILURE); if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); } -/* Check that the list of tabstops TABS, with ENTRIES entries, +/* Check that the list of tab stops TABS, with ENTRIES entries, contains only nonzero, ascending values. */ static void -validate_tabstops (uintmax_t const *tabs, size_t entries) +validate_tab_stops (uintmax_t const *tabs, size_t entries) { uintmax_t prev_tab = 0; size_t i; @@ -240,7 +240,7 @@ next_file (FILE *fp) } if (fp == stdin) clearerr (fp); /* Also clear EOF. */ - else if (fclose (fp) == EOF) + else if (fclose (fp) != 0) { error (0, errno, "%s", prev_file); exit_status = EXIT_FAILURE; @@ -273,14 +273,10 @@ next_file (FILE *fp) static void expand (void) { - FILE *fp; /* Input stream. */ - size_t tab_index = 0; /* Index in `tab_list' of next tabstop. */ - uintmax_t column = 0; /* Column of next char. */ - uintmax_t next_tab_column; /* Column the next tab stop is on. */ - bool convert = true; /* If true, perform translations. */ + /* Input stream. */ + FILE *fp = next_file (NULL); - fp = next_file ((FILE *) NULL); - if (fp == NULL) + if (!fp) return; /* Binary I/O will preserve the original EOL style (DOS/Unix) of files. */ @@ -288,74 +284,89 @@ expand (void) for (;;) { - int c = getc (fp); - if (c == EOF) - { - fp = next_file (fp); - if (fp) - { - SET_BINARY2 (fileno (fp), STDOUT_FILENO); - continue; - } - break; - } + /* Input character, or EOF. */ + int c; - if (c == '\n') - { - putchar (c); - tab_index = 0; - column = 0; - convert = true; - } - else if (c == '\t' && convert) - { - if (tab_size == 0) - { - /* Do not let tab_index == first_free_tab; - stop when it is 1 less. */ - while (tab_index < first_free_tab - 1 - && column >= tab_list[tab_index]) - tab_index++; - next_tab_column = tab_list[tab_index]; - if (tab_index < first_free_tab - 1) - tab_index++; - if (column >= next_tab_column) - next_tab_column = column + 1; /* Ran out of tab stops. */ - } - else - { - next_tab_column = column + tab_size - column % tab_size; - } - if (next_tab_column < column) - error (EXIT_FAILURE, 0, _("input line is too long")); - while (column < next_tab_column) - { - putchar (' '); - ++column; - } - } - else + /* If true, perform translations. */ + bool convert = true; + + + /* The following variables have valid values only when CONVERT + is true: */ + + /* Column of next input character. */ + uintmax_t column = 0; + + /* Index in TAB_LIST of next tab stop to examine. */ + size_t tab_index = 0; + + + /* Convert a line of text. */ + + do { + while ((c = getc (fp)) < 0 && (fp = next_file (fp))) + SET_BINARY2 (fileno (fp), STDOUT_FILENO); + if (convert) { - if (c == '\b') + if (c == '\t') + { + /* Column the next input tab stop is on. */ + uintmax_t next_tab_column; + + if (tab_size) + next_tab_column = column + (tab_size - column % tab_size); + else + for (;;) + if (tab_index == first_free_tab) + { + next_tab_column = column + 1; + break; + } + else + { + uintmax_t tab = tab_list[tab_index++]; + if (column < tab) + { + next_tab_column = tab; + break; + } + } + + if (next_tab_column < column) + error (EXIT_FAILURE, 0, _("input line is too long")); + + while (++column < next_tab_column) + if (putchar (' ') < 0) + error (EXIT_FAILURE, errno, _("write error")); + + c = ' '; + } + else if (c == '\b') { - if (column > 0) - { - column--; - tab_index -= (tab_index != 0); - } + /* Go back one column, and force recalculation of the + next tab stop. */ + column -= !!column; + tab_index -= !!tab_index; } else { - ++column; - if (column == 0) + column++; + if (!column) error (EXIT_FAILURE, 0, _("input line is too long")); - convert &= convert_entire_line; } + + convert &= convert_entire_line | ISBLANK (c); } - putchar (c); + + if (c < 0) + return; + + if (putchar (c) < 0) + error (EXIT_FAILURE, errno, _("write error")); } + while (c != '\n'); } } @@ -396,11 +407,11 @@ main (int argc, char **argv) convert_entire_line = false; break; case 't': - parse_tabstops (optarg); + parse_tab_stops (optarg); break; case ',': if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); have_tabval = false; obsolete_tablist = true; break; @@ -425,9 +436,9 @@ main (int argc, char **argv) } if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); - validate_tabstops (tab_list, first_free_tab); + validate_tab_stops (tab_list, first_free_tab); if (first_free_tab == 0) tab_size = 8; @@ -440,7 +451,7 @@ main (int argc, char **argv) expand (); - if (have_read_stdin && fclose (stdin) == EOF) + if (have_read_stdin && fclose (stdin) != 0) error (EXIT_FAILURE, errno, "-"); exit (exit_status); Index: src/unexpand.c =================================================================== RCS file: /home/eggert/coreutils/cu/src/unexpand.c,v retrieving revision 1.81 diff -p -u -r1.81 unexpand.c --- src/unexpand.c 3 Aug 2004 23:27:20 -0000 1.81 +++ src/unexpand.c 24 Aug 2004 06:58:38 -0000 @@ -1,4 +1,4 @@ -/* unexpand - convert spaces to tabs +/* unexpand - convert blanks to tabs Copyright (C) 89, 91, 1995-2004 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify @@ -25,12 +25,11 @@ --tabs=tab1[,tab2[,...]] -t tab1[,tab2[,...]] -tab1[,tab2[,...]] If only one tab stop is given, set the tabs tab1 - spaces apart instead of the default 8. Otherwise, + columns apart instead of the default 8. Otherwise, set the tabs at columns tab1, tab2, etc. (numbered from - 0); replace any tabs beyond the tabstops given with - single spaces. + 0); preserve any blanks beyond the tab stops given. --all - -a Use tabs wherever they would replace 2 or more spaces, + -a Use tabs wherever they would replace 2 or more blanks, not just at the beginnings of lines. David MacKenzie <[EMAIL PROTECTED]> */ @@ -55,13 +54,6 @@ allocated for the output line. */ #define OUTPUT_BLOCK 256 -/* A sentinel value that's placed at the end of the list of tab stops. - This value must be a large number, but not so large that adding the - length of a line to it would cause the column variable to overflow. - FIXME: The algorithm isn't correct once the numbers get large; - also, no error is reported if overflow occurs. */ -#define TAB_STOP_SENTINEL INTMAX_MAX - /* The name this program was run with. */ char *program_name; @@ -70,7 +62,10 @@ char *program_name; static bool convert_entire_line; /* If nonzero, the size of all tab stops. If zero, use `tab_list' instead. */ -static uintmax_t tab_size; +static size_t tab_size; + +/* The maximum distance between tab stops. */ +static size_t max_column_width; /* Array of the explicit column numbers of the tab stops; after `tab_list' is exhausted, the rest of the line is printed @@ -129,7 +124,7 @@ Usage: %s [OPTION]... [FILE]...\n\ "), program_name); fputs (_("\ -Convert spaces in each FILE to tabs, writing to standard output.\n\ +Convert blanks in each FILE to tabs, writing to standard output.\n\ With no FILE, or when FILE is -, read standard input.\n\ \n\ "), stdout); @@ -137,8 +132,8 @@ With no FILE, or when FILE is -, read st Mandatory arguments to long options are mandatory for short options too.\n\ "), stdout); fputs (_("\ - -a, --all convert all whitespace, instead of just initial whitespace\n\ - --first-only convert only leading sequences of whitespace (overrides -a)\n\ + -a, --all convert all blanks, instead of just initial blanks\n\ + --first-only convert only leading sequences of blanks (overrides -a)\n\ -t, --tabs=N have tabs N characters apart instead of 8 (enables -a)\n\ -t, --tabs=LIST use comma separated LIST of tab positions (enables -a)\n\ "), stdout); @@ -152,18 +147,28 @@ Mandatory arguments to long options are /* Add tab stop TABVAL to the end of `tab_list'. */ static void -add_tabstop (uintmax_t tabval) +add_tab_stop (uintmax_t tabval) { + uintmax_t column_width = + tabval - (first_free_tab ? tab_list[first_free_tab - 1] : 0); + if (first_free_tab == n_tabs_allocated) tab_list = x2nrealloc (tab_list, &n_tabs_allocated, sizeof *tab_list); tab_list[first_free_tab++] = tabval; + + if (max_column_width < column_width) + { + if (SIZE_MAX < column_width) + error (EXIT_FAILURE, 0, _("tabs are too far apart")); + max_column_width = column_width; + } } -/* Add the comma or blank separated list of tabstops STOPS - to the list of tabstops. */ +/* Add the comma or blank separated list of tab stops STOPS + to the list of tab stops. */ static void -parse_tabstops (char const *stops) +parse_tab_stops (char const *stops) { bool have_tabval = false; uintmax_t tabval IF_LINT (= 0); @@ -175,7 +180,7 @@ parse_tabstops (char const *stops) if (*stops == ',' || ISBLANK (to_uchar (*stops))) { if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); have_tabval = false; } else if (ISDIGIT (*stops)) @@ -214,14 +219,14 @@ parse_tabstops (char const *stops) exit (EXIT_FAILURE); if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); } -/* Check that the list of tabstops TABS, with ENTRIES entries, +/* Check that the list of tab stops TABS, with ENTRIES entries, contains only nonzero, ascending values. */ static void -validate_tabstops (uintmax_t const *tabs, size_t entries) +validate_tab_stops (uintmax_t const *tabs, size_t entries) { uintmax_t prev_tab = 0; size_t i; @@ -256,7 +261,7 @@ next_file (FILE *fp) } if (fp == stdin) clearerr (fp); /* Also clear EOF. */ - else if (fclose (fp) == EOF) + else if (fclose (fp) != 0) { error (0, errno, "%s", prev_file); exit_status = EXIT_FAILURE; @@ -283,147 +288,175 @@ next_file (FILE *fp) return NULL; } -/* Change spaces to tabs, writing to stdout. +/* Change blanks to tabs, writing to stdout. Read each file in `file_list', in order. */ static void unexpand (void) { - FILE *fp; /* Input stream. */ - size_t tab_index = 0; /* Index in `tab_list' of next tabstop. */ - size_t print_tab_index = 0; /* For printing as many tabs as possible. */ - uintmax_t column = 0; /* Column of next char. */ - uintmax_t next_tab_column; /* Column the next tab stop is on. */ - bool convert = true; /* If true, perform translations. */ - uintmax_t pending = 0; /* Pending columns of blanks. */ - int saved_errno IF_LINT (= 0); + /* Input stream. */ + FILE *fp = next_file (NULL); - fp = next_file ((FILE *) NULL); - if (fp == NULL) + /* The array of pending blanks. In non-POSIX locales, blanks can + include characters other than spaces, so the blanks must be + stored, not merely counted. */ + char *pending_blank; + + if (!fp) return; /* Binary I/O will preserve the original EOL style (DOS/Unix) of files. */ SET_BINARY2 (fileno (fp), STDOUT_FILENO); + /* The worst case is a non-blank character, then one blank, then a + tab stop, then MAX_COLUMN_WIDTH - 1 blanks, then a non-blank; so + allocate MAX_COLUMN_WIDTH bytes to store the blanks. */ + pending_blank = xmalloc (max_column_width); + for (;;) { - int c = getc (fp); - if (c == EOF) - { - fp = next_file (fp); - if (fp) - { - SET_BINARY2 (fileno (fp), STDOUT_FILENO); - continue; - } - saved_errno = errno; - } + /* Input character, or EOF. */ + int c; - if (c == ' ' && convert && column < TAB_STOP_SENTINEL) - { - ++pending; - ++column; - } - else if (c == '\t' && convert) - { - if (tab_size == 0) - { - /* Do not let tab_index == first_free_tab; - stop when it is 1 less. */ - while (tab_index < first_free_tab - 1 - && column >= tab_list[tab_index]) - tab_index++; - next_tab_column = tab_list[tab_index]; - if (tab_index < first_free_tab - 1) - tab_index++; - if (column >= next_tab_column) - { - convert = false; /* Ran out of tab stops. */ - goto flush_pend; - } - } - else - { - next_tab_column = column + tab_size - column % tab_size; - } - pending += next_tab_column - column; - column = next_tab_column; - } - else + /* If true, perform translations. */ + bool convert = true; + + + /* The following variables have valid values only when CONVERT + is true: */ + + /* Column of next input character. */ + uintmax_t column = 0; + + /* Column the next input tab stop is on. */ + uintmax_t next_tab_column = 0; + + /* Index in TAB_LIST of next tab stop to examine. */ + size_t tab_index = 0; + + /* If true, the first pending blank came just before a tab stop. */ + bool one_blank_before_tab_stop = false; + + /* If true, the previous input character was a blank. This is + initially true, since initial strings of blanks are treated + as if the line was preceded by a blank. */ + bool prev_blank = true; + + /* Number of pending columns of blanks. */ + size_t pending = 0; + + + /* Convert a line of text. */ + + do { - flush_pend: - /* Flush pending spaces. Print as many tabs as possible, - then print the rest as spaces. */ - if (pending == 1) - { - putchar (' '); - pending = 0; - } - column -= pending; - while (pending > 0) + while ((c = getc (fp)) < 0 && (fp = next_file (fp))) + SET_BINARY2 (fileno (fp), STDOUT_FILENO); + + if (convert) { - if (tab_size == 0) - { - /* Do not let print_tab_index == first_free_tab; - stop when it is 1 less. */ - while (print_tab_index < first_free_tab - 1 - && column >= tab_list[print_tab_index]) - print_tab_index++; - next_tab_column = tab_list[print_tab_index]; - if (print_tab_index < first_free_tab - 1) - print_tab_index++; - } - else - { - next_tab_column = column + tab_size - column % tab_size; - } - if (next_tab_column - column <= pending) - { - putchar ('\t'); - pending -= next_tab_column - column; - column = next_tab_column; - } - else + bool blank = ISBLANK (c); + + if (blank) { - --print_tab_index; - column += pending; - while (pending != 0) + if (next_tab_column <= column) { - putchar (' '); - pending--; + if (tab_size) + next_tab_column = + column + (tab_size - column % tab_size); + else + for (;;) + if (tab_index == first_free_tab) + { + convert = false; + break; + } + else + { + uintmax_t tab = tab_list[tab_index++]; + if (column < tab) + { + next_tab_column = tab; + break; + } + } } - } - } - if (c == EOF) - { - errno = saved_errno; - break; - } + if (convert) + { + if (next_tab_column < column) + error (EXIT_FAILURE, 0, _("input line is too long")); - if (convert) - { - if (c == '\b') + if (c == '\t') + { + column = next_tab_column; + + /* Discard pending blanks, unless it was a single + blank just before the previous tab stop. */ + if (! (pending == 1 && one_blank_before_tab_stop)) + { + pending = 0; + one_blank_before_tab_stop = false; + } + } + else + { + column++; + + if (! (prev_blank && column == next_tab_column)) + { + /* It is not yet known whether the pending blanks + will be replaced by tabs. */ + if (column == next_tab_column) + one_blank_before_tab_stop = true; + pending_blank[pending++] = c; + prev_blank = true; + continue; + } + + /* Replace the pending blanks by a tab or two. */ + pending_blank[0] = c = '\t'; + pending = one_blank_before_tab_stop; + } + } + } + else if (c == '\b') { - if (column > 0) - --column; + /* Go back one column, and force recalculation of the + next tab stop. */ + column -= !!column; + next_tab_column = column; + tab_index -= !!tab_index; } else { - ++column; - convert &= convert_entire_line; + column++; + if (!column) + error (EXIT_FAILURE, 0, _("input line is too long")); } - } - putchar (c); + if (pending) + { + if (fwrite (pending_blank, 1, pending, stdout) != pending) + error (EXIT_FAILURE, errno, _("write error")); + pending = 0; + one_blank_before_tab_stop = false; + } + + prev_blank = blank; + convert &= convert_entire_line | blank; + } - if (c == '\n') + if (c < 0) { - tab_index = print_tab_index = 0; - column = pending = 0; - convert = true; + free (pending_blank); + return; } + + if (putchar (c) < 0) + error (EXIT_FAILURE, errno, _("write error")); } + while (c != '\n'); } } @@ -435,7 +468,7 @@ main (int argc, char **argv) int c; /* If true, cancel the effect of any -a (explicit or implicit in -t), - so that only leading white space will be considered. */ + so that only leading blanks will be considered. */ bool convert_first_only = false; bool obsolete_tablist = false; @@ -469,14 +502,14 @@ main (int argc, char **argv) break; case 't': convert_entire_line = true; - parse_tabstops (optarg); + parse_tab_stops (optarg); break; case CONVERT_FIRST_ONLY_OPTION: convert_first_only = true; break; case ',': if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); have_tabval = false; obsolete_tablist = true; break; @@ -505,26 +538,22 @@ main (int argc, char **argv) convert_entire_line = false; if (have_tabval) - add_tabstop (tabval); + add_tab_stop (tabval); - validate_tabstops (tab_list, first_free_tab); + validate_tab_stops (tab_list, first_free_tab); if (first_free_tab == 0) - tab_size = 8; + tab_size = max_column_width = 8; else if (first_free_tab == 1) tab_size = tab_list[0]; else - { - /* Append a sentinel to the list of tab stop indices. */ - add_tabstop (TAB_STOP_SENTINEL); - tab_size = 0; - } + tab_size = 0; file_list = (optind < argc ? &argv[optind] : stdin_argv); unexpand (); - if (have_read_stdin && fclose (stdin) == EOF) + if (have_read_stdin && fclose (stdin) != 0) error (EXIT_FAILURE, errno, "-"); exit (exit_status); Index: tests/unexpand/basic-1 =================================================================== RCS file: /home/eggert/coreutils/cu/tests/unexpand/basic-1,v retrieving revision 1.14 diff -p -u -r1.14 basic-1 --- tests/unexpand/basic-1 8 Apr 2003 10:55:02 -0000 1.14 +++ tests/unexpand/basic-1 24 Aug 2004 06:42:47 -0000 @@ -41,13 +41,9 @@ my @Tests = ['b-1', '-t', '2,4', {IN=> " ."}, {OUT=>"\t\t ."}], # These would infloop prior to textutils-2.0d. - # Solaris' /bin/unexpand does this: - # ['infloop-1', '-t', '1,2', {IN=> " \t\t .\n"}, {OUT=>" \t\t .\n"}], - # FIXME: find out which is required - ['infloop-1', '-t', '1,2', {IN=> " \t\t .\n"}, {OUT=>"\t\t\t .\n"}], ['infloop-2', '-t', '4,5', {IN=> ' 'x4 . "\t\t \n"}, {OUT=>"\t\t\t \n"}], - ['infloop-3', '-t', '2,3', {IN=> "x \t\t \n"}, {OUT=>"x\t\t\t \n"}], + ['infloop-3', '-t', '2,3', {IN=> "x \t\t \n"}, {OUT=>"x \t\t \n"}], ['infloop-4', '-t', '1,2', {IN=> " \t\t \n"}, {OUT=>"\t\t\t \n"}], ['c-1', '-t', '1,2', {IN=> "x\t\t .\n"}, {OUT=>"x\t\t .\n"}], @@ -55,6 +51,21 @@ my @Tests = # Feature addition (--first-only) prompted by a report from Jie Xu. ['tabs-1', qw(-t 3), {IN=> " a b\n"}, {OUT=>"\ta\tb\n"}], ['tabs-2', qw(-t 3 --first-only), {IN=> " a b\n"}, {OUT=>"\ta b\n"}], + + # blanks + ['blanks-1', qw(-t 1), {IN=> " b c d\n"}, {OUT=> "\tb\t\tc\t\t\td\n"}], + ['blanks-2', qw(-t 1), {IN=> "a \n"}, {OUT=> "a \n"}], + ['blanks-3', qw(-t 1), {IN=> "a \n"}, {OUT=> "a\t\t\n"}], + ['blanks-4', qw(-t 1), {IN=> "a \n"}, {OUT=> "a\t\t\t\n"}], + ['blanks-5', qw(-t 1), {IN=> "a "}, {OUT=> "a "}], + ['blanks-6', qw(-t 1), {IN=> "a "}, {OUT=> "a\t\t"}], + ['blanks-7', qw(-t 1), {IN=> "a "}, {OUT=> "a\t\t\t"}], + ['blanks-8', qw(-t 1), {IN=> " a a a\n"}, {OUT=> "\ta a\t\ta\n"}], + ['blanks-9', qw(-t 2), {IN=> " a a a\n"}, {OUT=> "\t a\ta a\n"}], + ['blanks-10', '-t', '3,4', {IN=> "0 2 4 6\t8\n"}, {OUT=> "0 2 4 6\t8\n"}], + ['blanks-11', '-t', '3,4', {IN=> " 4\n"}, {OUT=> "\t\t4\n"}], + ['blanks-12', '-t', '3,4', {IN=> "01 4\n"}, {OUT=> "01\t\t4\n"}], + ['blanks-13', '-t', '3,4', {IN=> "0 4\n"}, {OUT=> "0\t\t4\n"}], ); my $save_temps = $ENV{DEBUG}; Index: doc/coreutils.texi =================================================================== RCS file: /home/eggert/coreutils/cu/doc/coreutils.texi,v retrieving revision 1.201 diff -p -u -r1.201 coreutils.texi --- doc/coreutils.texi 19 Aug 2004 20:05:52 -0000 1.201 +++ doc/coreutils.texi 24 Aug 2004 06:56:26 -0000 @@ -5090,15 +5090,15 @@ The program accepts the following option @itemx [EMAIL PROTECTED],@[EMAIL PROTECTED] @opindex -t @opindex --tabs [EMAIL PROTECTED] tabstops, setting [EMAIL PROTECTED] tab stops, setting If only one tab stop is given, set the tabs @var{tab1} spaces apart (default is 8). Otherwise, set the tabs at columns @var{tab1}, @var{tab2}, @dots{} (numbered from 0), and replace any tabs beyond the -last tabstop given with single spaces. Tabstops can be separated by +last tab stop given with single spaces. Tab stops can be separated by blanks as well as by commas. On older systems, @command{expand} supports an obsolete option [EMAIL PROTECTED]@var{tab1}[,@[EMAIL PROTECTED], where tabstops must be [EMAIL PROTECTED]@var{tab1}[,@[EMAIL PROTECTED], where tab stops must be separated by commas. @acronym{POSIX} 1003.1-2001 (@pxref{Standards conformance}) does not allow this; use @option{-t @var{tab1}[,@[EMAIL PROTECTED] instead. @@ -5123,16 +5123,17 @@ characters) on each line to spaces. @command{unexpand} writes the contents of each given @var{file}, or standard input if none are given or for a @var{file} of @samp{-}, to -standard output, with strings of two or more space or tab characters -converted to as many tabs as possible followed by as many spaces as are -needed. Synopsis: +standard output, converting blanks at the beginning of each line into +as many tab characters as needed. In the default @acronym{POSIX} +locale, a @dfn{blank} is a space or a tab; other locales may specify +additional blank characters. Synopsis: @example unexpand [EMAIL PROTECTED]@dots{} [EMAIL PROTECTED]@dots{} @end example -By default, @command{unexpand} converts only initial spaces and tabs (those -that precede all non space or tab characters) on each line. It +By default, @command{unexpand} converts only initial blanks (those +that precede all non-blank characters) on each line. It preserves backspace characters in the output; they decrement the column count for tab calculations. By default, tabs are set at every 8th column. @@ -5145,14 +5146,14 @@ The program accepts the following option @itemx [EMAIL PROTECTED],@[EMAIL PROTECTED] @opindex -t @opindex --tabs -If only one tab stop is given, set the tabs @var{tab1} spaces apart +If only one tab stop is given, set the tabs @var{tab1} columns apart instead of the default 8. Otherwise, set the tabs at columns [EMAIL PROTECTED], @var{tab2}, @dots{} (numbered from 0), and leave spaces and -tabs beyond the tabstops given unchanged. Tabstops can be separated by [EMAIL PROTECTED], @var{tab2}, @dots{} (numbered from 0), and leave blanks +beyond the tab stops given unchanged. Tab stops can be separated by blanks as well as by commas. This option implies the @option{-a} option. On older systems, @command{unexpand} supports an obsolete option [EMAIL PROTECTED]@var{tab1}[,@[EMAIL PROTECTED], where tabstops must be [EMAIL PROTECTED]@var{tab1}[,@[EMAIL PROTECTED], where tab stops must be separated by commas. (Unlike @option{-t}, this obsolete option does not imply @option{-a}.) @acronym{POSIX} 1003.1-2001 (@pxref{Standards conformance}) does not allow this; use @option{--first-only -t @@ -5162,8 +5163,8 @@ conformance}) does not allow this; use @ @itemx --all @opindex -a @opindex --all -Convert all strings of two or more spaces or tabs, not just initial -ones, to tabs. +Also convert all sequences of two or more blanks just before a tab stop. +even if they occur after non-blank characters in a line. @end table @@ -5832,7 +5833,7 @@ List the files in columns, sorted horizo @itemx [EMAIL PROTECTED] @opindex -T @opindex --tabsize -Assume that each tabstop is @var{cols} columns wide. The default is 8. +Assume that each tab stop is @var{cols} columns wide. The default is 8. @command{ls} uses tabs where possible in the output, for efficiency. If @var{cols} is zero, do not use tabs at all. _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/bug-coreutils