Péter Zsoldos wrote:
Greetings,
I'm using gVim 6.3 on Windows Xp Sp2 and I ran into a problem that I
need to edit files with UTF-8 encoding, but I just can't get VIM to do
so. If I create utf-8 encoded files in notepad, VIM accepts this, but
places the BOM into the file. This BOM causes me a lot of problems,
since I edit PHP files and quite often I need to send headers - anyone
with PHP experience can tell you BOM makes sending headers
impossible.... I tried :help encoding, but there is no 'enc' command
available and :set fileencodings=utf-8 doesn't seem to work either. I'll
continue googling, but help would be appreciated, because I'd need to
fix this in order to start working..
Thanks in advance,
Peter
The BOM is governed by the option 'bomb' (q.v.). Its presence makes
recognition of Unicode encodings trivial; most of the programs which
accept Unicode data will accept it, and sometimes it will make their
life easier. For instance, WordPad, whose "Unicode" files are usually in
UTF-16le, will accept UTF-8 files for reading if they have a BOM. (It
cannot write UTF-8 though.) All the Web navigators I've seen (Internet
Explorer, Netscape 6 or later, Firefox, Konqueror, even the lowly Lynx)
will accept HTML pages whose <!DOCTYPE (if present) or <HTML> tag is
preceded by a BOM. And so on and so forth. You _can_ forbid the output
of a BOM, even removing an existing one, by means of
:setlocal nobomb
but I don't recommend it. Rather, if you are sure that some program
accepts Unicode files only without BOM, in your place I would try to get
a newer version of the same program, or if it's the newest, complain to
its maker that the program is not up to industry standards. Or else, it
may be that the "headers" you talk about must be in 7-bit US-ASCII: in
that case it might be simplest to edit the headers as a separate file with
:setlocal fileencoding=latin1
or
:write ++enc=latin1
and to use "cat file1 file2 > file3" (Unix) or "COPY file+file2 file3"
(Dos/Windows) to affix the headers in front of the data. A trick I use
on this computer (whose locale is Unicode) to force Vim to recognise
Latin1 files as Latin1 and not as Unicode, is to put a comment
containing high-ASCII in them, for instance in HTML:
<!-- ÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷ -->
In .txt files, underlining the main title with ÷ gives the same result.
Of course, this doesn't work for 7-bit ASCII, but there, since it
represents all its data identically the same way as UTF-8, the
distinction is a moot point.
Note that for _existing_ Unicode documents, if your 'fileencodings'
(with s at the end) starts with "ucs-bom", Vim will correctly detect the
BOM and set the 'bomb' and (if there is a BOM) 'fileencoding' (singular)
options accordingly. If your Notepad files don't have a BOM, then if Vim
uses the typical setting
:set fileencodings=ucs-bom,utf-8,default
(for version 7) or
:set fileencodings=ucs-bom,utf-8,latin1
(for any version), it _wont_ add a BOM to an existing document which
doesn't already have one -- unless you toggle the 'bomb' option while
editing the file.
See
:help 'bomb'
:help 'fileencoding'
:help 'fileencodings'
:help ++opt
Best regards,
Tony.