2012-07-26 0:19, Steven Atreju wrote:

   |

And that was an Unicode BOM that has been converted to UTF-8 and
then been converted to UTF-8 once again.

Apparently the problem is that the data has been doubly encoded: first into UTF-8, then interpreting the bytes of UTF-8 data, interpreting them as if they were in windows-1252, and then UTF-8 encoding the resulting characters. This is of course very incorrect, and not uncommon.

   |vielen Dank für Ihre E-Mail.

So the letter “ü” was munged too, and presumably all non-ASCII data. So this is not an argument against using BOM in UTF-8. The BOM was a victim of incorrect processing, like everyone else (outside ASCII). One might even argue that the BOM is useful here, too, since it immediately signals that there is something wrong, and “” is an encoding error signature, so to say.

Yucca




Reply via email to