Hi Tony, On Jan 21, 2008 11:41 AM, Tony Mechelynck <[EMAIL PROTECTED]> wrote: > > Linxiao wrote: > [...] > > Tt, tt, tt... If 'encoding' is other than UTF-8 (or GB18030), Vim cannot > represent all Unicode codepoints in memory; therefore, if you try to edit a > UTF-8 file you run the risk of losing part of the data. (If you set 'enc' to > UTF-16, UCS-2 or UCS-4 aka UTF-32, with any endianness, what Vim will use is > actually UTF-8.)
I'm familiar with different shapes of malformed characters. In fact the *thread-host*'s problem was not caused by the code points losing. "²âÊÔ" was generated by the following steps: 1. At first, the thread-host represents "测试" in GBK encoding. 2. Then he re-sets the encoding to UTF-8. So the filename information in Vim gets lost. Vim re-interprets the filename as Latin-1. 3. Vim converts the latin-1 string to UTF-8. 4. Vim saves the file to the disk with the new name. Windows will convert the UTF-8 string to UCS, of course. Now the new filename is exactly "²âÊÔ". Here is the illustration (my system charset is UTF-8): [EMAIL PROTECTED] ~]$ echo 测试 | iconv -f utf-8 -t gbk | iconv -f latin1 -t utf-8 ²âÊÔ > To edit UTF-8 data you should have both 'encoding' (= memory representation of > the data) and 'fileencoding (= disk representation of the data) set to UTF-8. > > [...] > > Best regards, > Tony. > -- > During a grouse hunt in North Carolina two intrepid sportsmen > were blasting away at a clump of trees near a stone wall. Suddenly a > red-faced country squire popped his head over the wall and shouted, > "Hey, you almost hit my wife." > "Did I?" cried the hunter, aghast. "Terribly sorry. Have a > shot at mine, over there." > > > > > Regards, L. F. --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---