Re: How to set utf-8 locally (for a buffer) on loading the file

Tony Mechelynck Wed, 06 Oct 2010 13:37:50 -0700

On 05/10/10 15:07, Simon Ruderich wrote:

On Mon, Oct 04, 2010 at 10:22:44PM -0700, esquifit wrote:

[snip]


     NOTE: Changing this option will not change the encoding of the
     existing text in Vim.  It may cause non-ASCII text to become invalid.

This sentence is somewhat obscure. If I change 'encoding' after having
loading a buffer, it effectively causes 'some non-ASCII text to become
invalid'. What does then the statement mean, that 'this option will
not change the encoding of the existing text in VIm'?


'encoding' controls how Vim stores the data internally (not when
displaying). If you change it, the stored data is interpreted
differently. I guess for simplicity Vim doesn't recode the stored
data when changing 'encoding' thus causing problems if you change
it (as the data may be invalid or mean something different in the
new encoding).

Hope this helps,
Simon

'encoding' controls how Vim represents the data in its own memory,whether file text, string variables, mappings, you name it. What youtype on the keyboard is stored using 'encoding' (possibly aftertranslation from 'termencoding') and what Vim displays is recordedinternally using 'encoding' (though, in console mode only, it istranslated into 'termencoding' for sending to the terminal). If youchange 'encoding', Vim doesn't convert what is already in memory. Forpurely 7-bit-ASCII text (assuming you aren't on an EBCDIC machine) thisisn't a problem, but even changing between Latin1, ISO-8859-15 andWindows-1252 (which are very similar) will mean that some charactersabove 0x7F will be interpreted differently, and therefore displaydifferently. Changing into or out of UTF-8 may make some of the datainvalid, since in that charset anything above U+007F (including thecodepoints U+0080 to U+00FF which have the same "ordinal numeric value"as in Latin1) is represented by two or more bytes, the first one being0xC0 or higher, with as many "top one bits" as there are bytes in thewhole sequence, and the other one(s) being in the range 0x80 to 0xBF.Any byte with the top bit set is invalid in UTF-8 unless it is part ofsuch a sequence.

'termencoding' tells Vim what the keyboard sends and, in console mode,what the screen expects. Its default is the empty string, meaning «use'encoding'», which means that before changing 'encoding' (which the OSdoesn't know about) you should set 'termencoding' (if empty) to the old'encoding' value.

'fileencoding' (singular) is how the current buffer is encoded on disk.If nonempty and different from 'encoding' there will be a conversionoperation when reading or writing.

'fileencodings' (plural) defines the heuristics used by a multibyte Vimto guess an existing file's 'fileencoding' (singular). It is acomma-separated list of values, and Vim tries them from left to rightuntil there is one which "fits" (and if nothing fits, Vim falls back onthe empty string, meaning 'encoding' is used without conversion). Twocaveats in this respect:- ucs-bom (if used) should be first, because otherwise a leading BOMwill not always be recognised (some other encoding, tried before it,might give a "success" result);- there should be at most one 8-bit encoding, and it should be last,because an 8-bit encoding cannot give a "failure" result, which meansthat anything after the first 8-bit encoding will never be tried.



Best regards,
Tony.
--
"I bet the human brain is a kludge."
                -- Marvin Minsky

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: How to set utf-8 locally (for a buffer) on loading the file

Reply via email to