Re: what to do with multiple BOMs

2021-08-19 Thread Richard Damon
By the rules of Unicode, that character, if not the very first character of the file, should be treated as a “zero-width non-breaking space”, it is NOT a BOM character there. It’s presence in the files is almost certainly an error, and being caused by broken software or software processing file

Re: what to do with multiple BOMs

2021-08-19 Thread MRAB
On 2021-08-19 14:07, Robin Becker wrote: Channeling unicode text experts and xml people: I have xml entity with initial bytes ff fe ff fe which the file command says is UTF-16, little-endian text. I agree, but what should be done about the additional BOM. A test output made many years ago seem

what to do with multiple BOMs

2021-08-19 Thread Robin Becker
Channeling unicode text experts and xml people: I have xml entity with initial bytes ff fe ff fe which the file command says is UTF-16, little-endian text. I agree, but what should be done about the additional BOM. A test output made many years ago seems to keep the extra BOM. The xml context