Re: (Informational only: UTF-8 BOM and the real life)

Jukka K. Korpela Wed, 25 Jul 2012 14:53:30 -0700

2012-07-26 0:19, Steven Atreju wrote:

   |ï»¿


And that was an Unicode BOM that has been converted to UTF-8 and
then been converted to UTF-8 once again.

Apparently the problem is that the data has been doubly encoded: firstinto UTF-8, then interpreting the bytes of UTF-8 data, interpreting themas if they were in windows-1252, and then UTF-8 encoding the resultingcharacters. This is of course very incorrect, and not uncommon.

   |vielen Dank fÃ¼r Ihre E-Mail.

So the letter “ü” was munged too, and presumably all non-ASCII data. Sothis is not an argument against using BOM in UTF-8. The BOM was a victimof incorrect processing, like everyone else (outside ASCII). One mighteven argue that the BOM is useful here, too, since it immediatelysignals that there is something wrong, and “ï»¿” is an encoding errorsignature, so to say.


Yucca

Re: (Informational only: UTF-8 BOM and the real life)

Reply via email to