On 05/10/10 15:07, Simon Ruderich wrote:
On Mon, Oct 04, 2010 at 10:22:44PM -0700, esquifit wrote:
[snip]

     NOTE: Changing this option will not change the encoding of the
     existing text in Vim.  It may cause non-ASCII text to become invalid.

This sentence is somewhat obscure. If I change 'encoding' after having
loading a buffer, it effectively causes 'some non-ASCII text to become
invalid'. What does then the statement mean, that 'this option will
not change the encoding of the existing text in VIm'?

'encoding' controls how Vim stores the data internally (not when
displaying). If you change it, the stored data is interpreted
differently. I guess for simplicity Vim doesn't recode the stored
data when changing 'encoding' thus causing problems if you change
it (as the data may be invalid or mean something different in the
new encoding).

Hope this helps,
Simon

'encoding' controls how Vim represents the data in its own memory, whether file text, string variables, mappings, you name it. What you type on the keyboard is stored using 'encoding' (possibly after translation from 'termencoding') and what Vim displays is recorded internally using 'encoding' (though, in console mode only, it is translated into 'termencoding' for sending to the terminal). If you change 'encoding', Vim doesn't convert what is already in memory. For purely 7-bit-ASCII text (assuming you aren't on an EBCDIC machine) this isn't a problem, but even changing between Latin1, ISO-8859-15 and Windows-1252 (which are very similar) will mean that some characters above 0x7F will be interpreted differently, and therefore display differently. Changing into or out of UTF-8 may make some of the data invalid, since in that charset anything above U+007F (including the codepoints U+0080 to U+00FF which have the same "ordinal numeric value" as in Latin1) is represented by two or more bytes, the first one being 0xC0 or higher, with as many "top one bits" as there are bytes in the whole sequence, and the other one(s) being in the range 0x80 to 0xBF. Any byte with the top bit set is invalid in UTF-8 unless it is part of such a sequence.

'termencoding' tells Vim what the keyboard sends and, in console mode, what the screen expects. Its default is the empty string, meaning «use 'encoding'», which means that before changing 'encoding' (which the OS doesn't know about) you should set 'termencoding' (if empty) to the old 'encoding' value.

'fileencoding' (singular) is how the current buffer is encoded on disk. If nonempty and different from 'encoding' there will be a conversion operation when reading or writing.

'fileencodings' (plural) defines the heuristics used by a multibyte Vim to guess an existing file's 'fileencoding' (singular). It is a comma-separated list of values, and Vim tries them from left to right until there is one which "fits" (and if nothing fits, Vim falls back on the empty string, meaning 'encoding' is used without conversion). Two caveats in this respect: - ucs-bom (if used) should be first, because otherwise a leading BOM will not always be recognised (some other encoding, tried before it, might give a "success" result); - there should be at most one 8-bit encoding, and it should be last, because an 8-bit encoding cannot give a "failure" result, which means that anything after the first 8-bit encoding will never be tried.


Best regards,
Tony.
--
"I bet the human brain is a kludge."
                -- Marvin Minsky

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to