Re: Changing encoding of an already loaded buffer

Gabriele F Wed, 09 Dec 2020 11:35:50 -0800

On 08/12/2020 17.47, Bram Moolenaar wrote:

This works:
:set fencs=utf8
:%!cat
although "fenc" remains "latin1".

Yeah, for an existing buffer and filtering the first entry in 'fencs' is
used to read the filter output, but 'fenc' isn't set.  That's a bit
strange, but I'm not sure what would break if we change this.  It might
actually be good to fix this, since if you write that file it might get
messed up.

I performed a couple of tests trying to write the result to a file afterdoing the above (using a correct UTF-8 file as source):- if you leave fenc to latin1 the new file will be in latin1 (with allthe characters correctly encoded)- if you set fenc to utf8 *after* the %!cat (but of course beforewriting the file) the new file will be in UTF-8 with all the characterscorrectly encoded- if you set fenc to utf8 *before* the %!cat (and of course beforewriting the file) the new file will be... a mess: by all appearances Vimthinks that the individual bytes of the UTF-8 file are individual latin1characters, and it then converts them to UTF-8; so you'll get a UTF-8encoded file with the wrong characters, e.g. a "C3 B2" sequence in theoriginal file, which stands for a UTF-8 encoded "ò", (Unicode code pointF2) will become a "C3 83 C2 B2" sequence in the written file: "C3" is a"Â" in latin1 (and yes, in Unicode too), and "Â" is encoded as "C3 83"in UTF-8, "B2" is a "²" in latin1 (and Unicode) and "²" is encoded as"C2 B2" in UTF-8 (in case someone noticed it, don't let yourself getconfused by the fact that C3 and B2 occur both in the source and thetranslated sequence, that's largely just an unfortunate coincidence ofmy example).

Given that Unicode is identical to latin1 in the first 256 characters,to better confirm what happened I also tried using another charset(cp850) instead of latin1 in the above tests (fencs=cp850 in my vimrcand setting fenc=cp850 in the second and third tests), still using acorrect UTF-8 file as a source; the results are analogous, with acorrect cp850 file in the first test, a correct UTF-8 one in the secondand a UTF-8 one with the original file's bytes interpreted as cp850 andthen converted to UTF-8 in the third (the original "ò", "C3 83", becomesa "E2 94 9C E2 96 93" sequence, given that "C3" is a "├" symbol incp850, Unicode code point 251C -> "E2 94 9C" UTF-8, and 83 is a "▓",Unicode code point 2593 -> "E2 96 93" UTF-8).


Yes, I... ahem, had a lot of fun this afternoon :D


Cheers

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/d90f2dd2-ef6a-fb16-0118-4f30dc238aba%40tiscali.it.

Re: Changing encoding of an already loaded buffer

Reply via email to