> in MS-DOS, file3 will have the following contents:
>
> BOM
> contents from file1
> BOM
> contents from file2
>
> Is this in accordance with the Unicode standard

Nope. When concatenating two files (or any streams) of which the
second one has a BOM, the second one should be deleted.
However, there's a rule which states that if a U+FEFF character
appears in the middle of a file, it should be treated as a zero
width no-break space, that is, identical to a zero width word joiner
(U+2060). So it's not as big as a problem as it may look.

But now you've got me wondering whether there are any rules or
guidelines for the situation where two files are joined, and the
second one has a BOM, but the first one hasn't. Should the resulting
file have a BOM? I.E. should a BOM be added to what was the contents
of the first file?

Pim Blokland


Reply via email to