Re: utf-8 encoding without BOM

A.J.Mechelynck Thu, 31 Aug 2006 03:27:46 -0700

Péter Zsoldos wrote:

Greetings,
I'm using gVim 6.3 on Windows Xp Sp2 and I ran into a problem that Ineed to edit files with UTF-8 encoding, but I just can't get VIM to doso. If I create utf-8 encoded files in notepad, VIM accepts this, butplaces the BOM into the file. This BOM causes me a lot of problems,since I edit PHP files and quite often I need to send headers - anyonewith PHP experience can tell you BOM makes sending headersimpossible.... I tried :help encoding, but there is no 'enc' commandavailable and :set fileencodings=utf-8 doesn't seem to work either. I'llcontinue googling, but help would be appreciated, because I'd need tofix this in order to start working..
Thanks in advance,

Peter

The BOM is governed by the option 'bomb' (q.v.). Its presence makesrecognition of Unicode encodings trivial; most of the programs whichaccept Unicode data will accept it, and sometimes it will make theirlife easier. For instance, WordPad, whose "Unicode" files are usually inUTF-16le, will accept UTF-8 files for reading if they have a BOM. (Itcannot write UTF-8 though.) All the Web navigators I've seen (InternetExplorer, Netscape 6 or later, Firefox, Konqueror, even the lowly Lynx)will accept HTML pages whose <!DOCTYPE (if present) or <HTML> tag ispreceded by a BOM. And so on and so forth. You _can_ forbid the outputof a BOM, even removing an existing one, by means of


        :setlocal nobomb

but I don't recommend it. Rather, if you are sure that some programaccepts Unicode files only without BOM, in your place I would try to geta newer version of the same program, or if it's the newest, complain toits maker that the program is not up to industry standards. Or else, itmay be that the "headers" you talk about must be in 7-bit US-ASCII: inthat case it might be simplest to edit the headers as a separate file with


        :setlocal fileencoding=latin1
or
        :write ++enc=latin1

and to use "cat file1 file2 > file3" (Unix) or "COPY file+file2 file3"(Dos/Windows) to affix the headers in front of the data. A trick I useon this computer (whose locale is Unicode) to force Vim to recogniseLatin1 files as Latin1 and not as Unicode, is to put a commentcontaining high-ASCII in them, for instance in HTML:


<!-- ÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷ -->

In .txt files, underlining the main title with ÷ gives the same result.Of course, this doesn't work for 7-bit ASCII, but there, since itrepresents all its data identically the same way as UTF-8, thedistinction is a moot point.

Note that for _existing_ Unicode documents, if your 'fileencodings'(with s at the end) starts with "ucs-bom", Vim will correctly detect theBOM and set the 'bomb' and (if there is a BOM) 'fileencoding' (singular)options accordingly. If your Notepad files don't have a BOM, then if Vimuses the typical setting


        :set fileencodings=ucs-bom,utf-8,default

(for version 7) or

        :set fileencodings=ucs-bom,utf-8,latin1

(for any version), it _wont_ add a BOM to an existing document whichdoesn't already have one -- unless you toggle the 'bomb' option whileediting the file.



See
        :help 'bomb'
        :help 'fileencoding'
        :help 'fileencodings'
        :help ++opt

Best regards,
Tony.

Re: utf-8 encoding without BOM

Reply via email to