Re: How to display and remove BOM in utf-8 encoded file

Tony Mechelynck Wed, 10 Aug 2011 04:19:49 -0700

On 10/08/11 02:18, pansz wrote:

On Tue, Aug 9, 2011 at 11:13 PM, Tony Mechelynck
<[email protected]>  wrote:


That message is outdated. The BOM is supported in all Unicode encodings
including UTF-8 by all "reasonably recent" browers. It is also part of the
HTML standard.


BOM is a standard for UCS2 or UTF-16, not for UTF-8.

According to the Unicode FAQ,http://www.unicode.org/faq//utf_bom.html#bom4 (two successive FAQquestions) a BOM can be used in UTF-8 as well as in UTF-16 or UTF-32;but since UTF-8 doesn't have endianness variants, with UTF-8 itspecifies encoding only, not endianness. BTW, "good" editors (includingat least Vim and WordPad, possibly others) handle the BOM correctly,even in UTF-8. In fact, in my experience WordPad won't read UTF-8 textcorrectly _unless_ there is a BOM.

However (about your next paragraph), when UTF-8 is fed "transparently"to a program which expects ASCII, and in particular to any program whichexpects #! at the start of a file, the BOM should not be used (see the2nd FAQ question linked above, and alsohttp://www.unicode.org/faq//utf_bom.html#bom10 "How I should deal withBOMs?", point 3.


BOM for utf-8 will cause problem for most programs which expect text
streams. gcc is a good example, most GNU CLI utilities will reject
utf-8 with BOM.

I explicitly mentioned in the part you snipped that for some other kindsof text than HTML or CSS (such as, I said, source files and shellscripts) it is better to save the file without a BOM.


And, W3C validator will of course complain about it...


...with a warning, not an error; and Tidy won't.

Best regards,
Tony.
--
"My weight is perfect for my height -- which varies"

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: How to display and remove BOM in utf-8 encoded file

Reply via email to