On 05/10/10 15:07, Simon Ruderich wrote:
On Mon, Oct 04, 2010 at 10:22:44PM -0700, esquifit wrote:
[snip]
NOTE: Changing this option will not change the encoding of the
existing text in Vim. It may cause non-ASCII text to become invalid.
This sentence is somewhat obscure. If I change 'encoding' after having
loading a buffer, it effectively causes 'some non-ASCII text to become
invalid'. What does then the statement mean, that 'this option will
not change the encoding of the existing text in VIm'?
'encoding' controls how Vim stores the data internally (not when
displaying). If you change it, the stored data is interpreted
differently. I guess for simplicity Vim doesn't recode the stored
data when changing 'encoding' thus causing problems if you change
it (as the data may be invalid or mean something different in the
new encoding).
Hope this helps,
Simon
'encoding' controls how Vim represents the data in its own memory,
whether file text, string variables, mappings, you name it. What you
type on the keyboard is stored using 'encoding' (possibly after
translation from 'termencoding') and what Vim displays is recorded
internally using 'encoding' (though, in console mode only, it is
translated into 'termencoding' for sending to the terminal). If you
change 'encoding', Vim doesn't convert what is already in memory. For
purely 7-bit-ASCII text (assuming you aren't on an EBCDIC machine) this
isn't a problem, but even changing between Latin1, ISO-8859-15 and
Windows-1252 (which are very similar) will mean that some characters
above 0x7F will be interpreted differently, and therefore display
differently. Changing into or out of UTF-8 may make some of the data
invalid, since in that charset anything above U+007F (including the
codepoints U+0080 to U+00FF which have the same "ordinal numeric value"
as in Latin1) is represented by two or more bytes, the first one being
0xC0 or higher, with as many "top one bits" as there are bytes in the
whole sequence, and the other one(s) being in the range 0x80 to 0xBF.
Any byte with the top bit set is invalid in UTF-8 unless it is part of
such a sequence.
'termencoding' tells Vim what the keyboard sends and, in console mode,
what the screen expects. Its default is the empty string, meaning «use
'encoding'», which means that before changing 'encoding' (which the OS
doesn't know about) you should set 'termencoding' (if empty) to the old
'encoding' value.
'fileencoding' (singular) is how the current buffer is encoded on disk.
If nonempty and different from 'encoding' there will be a conversion
operation when reading or writing.
'fileencodings' (plural) defines the heuristics used by a multibyte Vim
to guess an existing file's 'fileencoding' (singular). It is a
comma-separated list of values, and Vim tries them from left to right
until there is one which "fits" (and if nothing fits, Vim falls back on
the empty string, meaning 'encoding' is used without conversion). Two
caveats in this respect:
- ucs-bom (if used) should be first, because otherwise a leading BOM
will not always be recognised (some other encoding, tried before it,
might give a "success" result);
- there should be at most one 8-bit encoding, and it should be last,
because an 8-bit encoding cannot give a "failure" result, which means
that anything after the first 8-bit encoding will never be tried.
Best regards,
Tony.
--
"I bet the human brain is a kludge."
-- Marvin Minsky
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php