Shlomi Tal <[EMAIL PROTECTED]> wrote:

> If you're going to take the trouble of making text tools 16-bit
> aware, then you can afford to make them BOM-aware too.
>
> type a.txt b.txt c.txt > d.txt
>
> on Windows 2000, assuming that they are all UTF-16 (with an FFFE at
> the beginning of each, as is usual in MS-Windows Unicode files),
> strips every BOM except the last, so that d.txt has only the usual
> one initial FFFE. So it's not an immovable obstacle.

Someone will undoubtedly claim that this breaks data integrity in the
case of files that start with a genuine zero-width no-break space.  This
scenario makes no sense to me, since the whole purpose of ZWNBSP is to
affect the breaking and spacing behavior *between* two characters, but
it seems to be legal Unicode nonetheless.

When U+2060 WORD JOINER becomes widespread enough that Unicode version X
(for some X >= 4.0) can strongly deprecate the use of U+FEFF as a
zero-width no-break space, then it will make more sense for all
Unicode-aware text tools (regardless of UTF) to handle BOMs in the way
Shlomi describes.

-Doug Ewell
 Fullerton, California



Reply via email to