On Wed, 24 Apr 2002, David Starner wrote:
> On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote: > > The Unix and Linux world is very > > opposed to the use of BOM in plain-text files, and if they feel that way > > about UTF-8 they probably feel the same about UTF-16. The reason we're not so fond of UTF-8 with BOM is that it 'breaks' a lot of time-honored Unix command line text-processing tools. The simplest example is concatenating multiple files with 'cat'. With BOM at the beginning, the following doesn't work as intended. $ cat f1 f2 f3 f4 | sort | uniq | sed '....' > f5 For Sure, by typing a couple of more commands(enclosing 'cat' with 'for loop', for instance), we can work around that, but .... > Why? The problems with a BOM in UTF-8 have to do with it being an > ASCII-compatible encoding. (I'd guess that if there are any Unixes that > use EBCDIC, the same problems would apply to UTF-EBCDIC.) Pretty much > the only reason one would use UTF-16 is to be compatible with a foreign > system, and then you use the conventions of that system. I totally agree with you. We don't expect text tools to work on files in UTF-16 the same way as we would expect them to work on files in UTF-8 or other ASCII-compatible encodings. Jungshik Shin