Re: Unicode input bug fix

Ben Schmidt Mon, 21 Feb 2011 05:27:29 -0800

On 22/02/11 12:09 AM, Bram Moolenaar wrote:

Phil Carter wrote:

Using ctrl+v U in insert mode, you can enter Unicode characters by
code point. UTF-8 can only encode up to U+7FFFFFFF. Entering any code
point up to that value works fine, but if if you type "ctrl+v U
81234567" for example, you get "<t_" followed by some other bytes
instead of the requested code point.

I don't know where to submit bug fixes but since this one is only two
lines long, I'll post it here. In edit.c you can replace this line:

if ((unicode == 'u'&&  i>= 4) || (unicode == 'U'&&  i>= 8))

with this:

if ((unicode == 'u'&&  i>= 4) ||
(unicode == 'U'&&  (i == 7&&  cc>  0x7FFFFFF || i>= 8)))

This way vim stops reading input after 7 hex digits if an eighth digit
would make the code point higher than what UTF-8 can encode.


I have been wondering if we should restrict the Unicode characters to
10FFFF.  This is the official limit that was set a couple of years ago.
There won't be valid characters above this limit, so why allow inserting
them?


Just have to make sure that if this restriction is made, all invalid
byte sequences are correctly displayed as their individual bytes, and
can be entered as such.

Ben.



--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Unicode input bug fix

Raspunde prin e-mail lui