Re: Issue in match() function with multi-byte characters

ZyX Sun, 30 Mar 2014 07:12:32 -0700

> String indexing *must* not fixed. As I said there is a number of plugins that 
> need *exactly* bytes: any plugin implementing hash function. char2nr(s[i]) is 
> guaranteed to return a value between 0x00 and 0xFF (inclusive) (0x00 is 
> returned only if s[i] is an empty string).


There are basically two variants how I see this situation mitigated: new data 
type like unicode() in python-2* with old strings being same as str() in 
python-2* with a function to convert from str() to unicode() and a set of mb*() 
functions.

First variant have an advantage that by using the same unicode() as python-3* 
str() you may have O(1) indexing operations (if you keep utf-8 strings it will 
be O(N)).

Second variant has an advantage of being far easier to implement: you just add 
a few function definitions without requiring to add big bunch of 
unicode()->str() conversions in a number of places, supporting unicode() 
objects in regex engine (python-3* str() objects use ASCII, latin1, UTF-16 or 
UTF-32 depending on what is the highest byte, but vim only accepts 
ASCII-compatible encodings) and so on. To make first variant perform better you 
need to modify *each* function that uses strings to work with unicode() or 
waste lots of time on unicode()->str()[->unicode()] conversions.

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue in match() function with multi-byte characters

Reply via email to