> unexpand "converts spaces to tabs". > This commands behavior is so simple (s/ /\t/g) that it can be > knocked out in a couple hours,
Well...sort of. unexpand without -a can be, sure. With -a, it's more complicated, unless you are willing to assume things like "no multibyte characters" or "all non-ASCII text is Shift-JIS". > Since the command only looks for 2 characters (' ' and '\t'), no UTF > safety checking is required, Safety? If you want to support multibyte characters of any sort with -a, you need to parse them enough to determine how many bytes make up each character, because that affects how many spaces to eat to convert to a tab. (Without -a, this is not an issue.) For example, if you get a line containing, in hex, d0 b0 d0 b0 d0 b0 20 20 20 20 20 20 20 20 40 then (assuming 8-character tabstops and -a in effect), then under 8859-1 you have (to use Unicode names) LATIN CAPITAL LETTER ETH and DEGREE SIGN, with the pair repeated three times, and you thus convert the first _two_ of the spaces to a tab, but under UTF-8 you have three instances of CYRILLIC SMALL LETTER A and you thus convert the first _five_ of the spaces to a tab. (Handling tabs in the input makes it even more complicated.) > The GNU man page doesn't say if spaces are supposed to be processed > beyond the beginning of lines. The GNU man page is relevant to only the GNU version. I would not use it as a reference for anything else, least of all what the command should do in the abstract. (That said, I would have hoped they would document their software more precisely, such as saying what happens to non-initial whitespace in the absence of -a.) A non-GNU (NetBSD) manapge I have handy says -a By default, only leading blanks and tabs are reconverted to maximal strings of tabs. If the -a option is given, then tabs are inserted whenever they would compress the resultant file by replacing two or more characters. which is, at least, clearer. (That version has nothing like GNU's --first-only, or at least the manpage doesn't.) > [...], and the "--first-only" option serves the same purpose as grep > -G (None at all, [...]) Actually, it does; it can be specified to get the default behaviour when the opposing option might have been specified already. For example, if I have a wrapper script (let's call it "unex") #! /bin/sh set $UNEX_OPTIONS "$@" unexpand "$@" then I can run "unex --first-only" to get the default behaviour regardless of whether -a is present in $UNEX_OPTIONS. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net