Re: Unicode input bug fix

Brian Gernhardt Mon, 21 Feb 2011 02:15:45 -0800

On Feb 21, 2011, at 1:38 AM, Phil Carter wrote:

> Using ctrl+v U in insert mode, you can enter Unicode characters by code 
> point. UTF-8 can only encode up to U+7FFFFFFF. Entering any code point up to 
> that value works fine, but if if you type "ctrl+v U 81234567" for example, 
> you get "<t_" followed by some other bytes instead of the requested code 
> point.


Note that while the encoding used for UTF-8 can store up to 0x7FFFFFFF, Unicode 
only goes up to U+10FFFF.  In fact, RFC 3629 explicitly limits UTF-8 to the 
range 0000-10FFFF.  Earlier versions do not include this range limit but since 
Unicode itself ends at 10FFFF, it is unclear what the longer byte sequences 
should mean.

That said, I'm not sure Vim is really in the "not letting the user shoot 
themselves in the foot" business.  Phil's fix does prevent completely incorrect 
output and (IMNSHO) should be applied even if we don't want to limit output to 
actual UTF-8/Unicode.  :-)

Feeling slightly pedantic,
~~ Brian Gernhardt

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Unicode input bug fix

Raspunde prin e-mail lui