Shlomi Tal <[EMAIL PROTECTED]> wrote: > If you're going to take the trouble of making text tools 16-bit > aware, then you can afford to make them BOM-aware too. > > type a.txt b.txt c.txt > d.txt > > on Windows 2000, assuming that they are all UTF-16 (with an FFFE at > the beginning of each, as is usual in MS-Windows Unicode files), > strips every BOM except the last, so that d.txt has only the usual > one initial FFFE. So it's not an immovable obstacle.
Someone will undoubtedly claim that this breaks data integrity in the case of files that start with a genuine zero-width no-break space. This scenario makes no sense to me, since the whole purpose of ZWNBSP is to affect the breaking and spacing behavior *between* two characters, but it seems to be legal Unicode nonetheless. When U+2060 WORD JOINER becomes widespread enough that Unicode version X (for some X >= 4.0) can strongly deprecate the use of U+FEFF as a zero-width no-break space, then it will make more sense for all Unicode-aware text tools (regardless of UTF) to handle BOMs in the way Shlomi describes. -Doug Ewell Fullerton, California