> You command above would now expand to something like this: > > cat -R UTF-16 -F UTF-16LE file1 -F Big-5 file2 > file3 > > Provided with information about the input encodings and the > expected output > encoding, "cat" could now correctly handle BOM's, endianness, new-line > conventions, and even perform character set conversions. > Without this extra > info, "cat" would retain its good ol' byte-by-byte functionality. > > Similar options could be added to any Unix command > potentially dealing with > text files ("cp", "head", "tail", etc.), as well as to their > equivalents in > DOS or other operating systems.
To avoid "flag bloat", one can instead use the "iconv" command, and apply that to the source files. Since "head" and "tail" assumes an ASCII compatible singlebyte or multibyte encoding, where any state is reset at LF, the target encoding for the iconv command must, for those commands, be such an encoding. /kent k