On Wed, 24 Apr 2002, David Starner wrote:

> On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote:
> > The Unix and Linux world is very
> > opposed to the use of BOM in plain-text files, and if they feel that way
> > about UTF-8 they probably feel the same about UTF-16.

 The reason we're not so fond of  UTF-8 with BOM is that it 'breaks' a
lot of time-honored Unix command line text-processing tools. The simplest
example is concatenating multiple files with 'cat'. With BOM at the
beginning, the following doesn't work as intended.

  $ cat f1 f2 f3 f4 | sort | uniq | sed '....' > f5

For Sure, by typing a couple of more commands(enclosing 'cat'
with 'for loop',  for instance), we can work around that,
but ....

> Why? The problems with a BOM in UTF-8 have to do with it being an
> ASCII-compatible encoding. (I'd guess that if there are any Unixes that
> use EBCDIC, the same problems would apply to UTF-EBCDIC.) Pretty much
> the only reason one would use UTF-16 is to be compatible with a foreign
> system, and then you use the conventions of that system.

 I  totally agree with you. We don't expect text tools
to work on files in UTF-16 the same way as we would expect them to work
on files in UTF-8 or other ASCII-compatible encodings.

  Jungshik Shin



Reply via email to