发件人: 琉璃井 <pharaoh...@163.com>
发送时间: 2011-04-04 11:00
主 题: Re: [Vala] how can I get the number of unicode points in a string?
收件人:vala-list@gnome.org



于 2011/4/3 21:30, Adam Dingle 写道: 
 
>  On 04/03/2011 06:08 AM, 琉璃井 wrote: 
>>  From: "琉璃井"<pharaoh...@163.com> 
>>  Date: 2011-04-03 18:15:12 
>>  To: "Luca Bruno"<lethalma...@gmail.com> 
>>  Subject: Re:Re: [Vala] how can I get the number of unicode points in 
>>  a string? 
>> 
>>  At 2011-04-03 16:06:32,"Luca Bruno"<lethalma...@gmail.com>  wrote: 
>> 
>>>  On Sun, Apr 03, 2011 at 03:59:23PM +0800, 琉璃井 wrote: 
>>>>  I see that since 0.11.0 vala string.length returns number of bytes 
>>>>  rather than that of unicode characters, and string[i] returns only 
>>>>  one byte. I wonder how to deal with east Asian character strings. 
>>>  There are other methods in string that deal with utf8. For example 
>>>  char_count() and next_char(). 
>>> 
>>  thank you. 
>>  I find char_count(), get_char() and next_char() in gtk+ document. 
>>  Looks like these methods are not covered in vala tutorial and document. 
>>  Is there something like string[i] for index access to utf8? I didn't 
>>  get it in docs. 
> 
>  To get the i-th character, you could do this: 
> 
>  str.get_char(str.index_of_nth_char(i)); 
> 
>  But the current string methods are designed for iteration by offsets, 
>  not characters. So you should *not* do this, which will be inefficient: 
> 
>  for (int i = 0 ; i<  str.char_count() ; ++i) // don't do this 
>  str.get_char(str.index_of_nth_char(i)); 
> 
>  Instead, you want to iterate over the string using get_char() and 
>  next_char(). This is slightly inconvenient since these functions use 
>  pointers rather than integer offsets. In Vala trunk, Jürg has just 
>  committed a new method string.get_next_char() which will make it 
>  easier to iterate over strings: 
> 
>  // in class string 
>  public bool get_next_char (ref int index, out unichar c); 
> 
>  That isn't in any Vala release yet, though. (In the meantime, you 
>  might be able to copy and paste his implementation from glib-2.0.vapi 
>  in Vala trunk.) 
> 
>  adam 
I know get_char and next_char are used for reducing iteration overhead, 
but there may be other convenient way to access a utf8 string with 
efficency. After all, getting a byte from a string using offset is not 
so resonable because people seldom needs to get a byte in a whole 
character. 
Is it possible to design the string like this: 
class string 
{ 
private unichar* buffer; 
private int* offset_array; 
... ... 
public unichar operator [](const int i) 
{ 
int offset=offset_array[i]; 
return buffer[offset]; 
} 
} 
offset_array stores the offset of utf8 charater by index. It is 
initialized in constructor or something. 
Then we can use string[index] with no iteration overhead. 
 
_______________________________________________
vala-list mailing list
vala-list@gnome.org
http://mail.gnome.org/mailman/listinfo/vala-list

Reply via email to