Yongwei Wu wrote:
Hi Bram,

On 2/27/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote:

Yongwei Wu wrote:

> On 2/27/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote:
> >
> > If I understand it correctly is GB18030 a multi-byte character set that
> > is mostly the same as cp936, but adds a number of 4-byte characters.
> > Vim does not support those 4-byte characters, thus setting 'encoding' to
> > gb18030 won't work.
> >
> > But conversion between gb18030 and utf-8 should work, thus when
> > 'encoding' is utf-8 it should be possible to use gb18030 in
> > 'fileencodings' and 'fileencoding'.  Perhaps you can check if that
> > works.
>
> No, with Patch 58 Vim regards gb18030 as an alias for cp936, and
> gb18030 does not work at all: this is the major problem.

Please be specific: What do you mean with "does not work at all"?

As I said, Vim regards gb18030 as cp936 after Patch 58. I.e., "e
++enc=gb18030" is equivalent to "e ++enc=cp936" now. One cannot
correctly open in Vim a file encoded in GB18030, and the 4-byte
encoded characters will not be correct.

Since Vim doesn't support gb18030 internally and only Unicode has all
the characters, I guess it would only work to edit these files when
'encoding' is "utf-8".

That depends on the purpose. If one just set GB18030 because it is the
default value, setting encoding to cp936 works most of the time. In
some edge cases if only a subset of GB18030 characters are used, other
encodings may work well too. However, to support GB18030 properly, I
believe UTF-8 as 'encoding' is the rightful choice. I have been using
encoding=utf-8 for very long now.

However, if gb18030 is used in the environment
that means that console output needs to be converted, thus
'termencoding' also needs to be set.

Not if encoding==gbk. According to the discussion, Edward would want
to alias GB18030 to GBK in the environment, and in that case, encoding
will be GBK by default, and all characters output will be in the range
of GBK, so no conversion is needed. If encoding is (manually set to)
utf-8 while the environment is GB18030, the hack Edward uses has no
effect at all (which, I believe, is to make sure Vim will by default
get an encoding 'CP936' instead of 'Latin1', so Chinese can be
processed correctly), and one would need to manually set tenc to
gb18030 anyhow.

If 'encoding' is manually set to UTF-8, this doesn't change how the keyboard and, in Console Vim, the display process characters. Now it happens that the default for 'termencoding' is empty, meaning "use 'encoding'". This, in turn, means that if we set 'encoding' to UTF-8 manually (by keyboard or in the vimrc) we need to preserve the locale encoding in 'termencoding' to avoid garbling of keyboard input (and, in console Vim, of terminal output). Here is (IIUC) an example of how to set 'encoding' to UTF-8 "prudently" at the start of the vimrc:

if has("multi_lang")
        " preserve menu charset
        let &langmenu = v:lang
        if &langmenu !~ '\.'
                let &langmenu .= '.' . &enc
        endif
endif
if has("multi_byte")
        if &enc !~? '^u'             " if already Unicode, no change necessary
                if &tenc == ""
                        let &tenc = &enc " preserve keyboard & xterm charset
                endif
                set enc=utf-8
        endif
endif
runtime vimrc_example.vim            " setup menus, syntax highlights, etc.

etc.


If I were to choose between not supporting GB18030 text properly or
not supporting locale zh_CN.GB18030 properly, I would choose the
latter. Yet another "solution" is move the gb18030 line down to the
UNIX-specific part (say, l.392 in mbyte.c). It will be better than
now, but still a hack and can surprise people.

Best regards,

Yongwei


Best regards,
Tony.
--
If the American dream is for Americans only, it will remain our dream
and never be our destiny.
                -- René de Visme Williamson

Reply via email to