Re: [yast-devel] Re: YCP substring() Was: YCP String operator [] and UTF-8

Josef Reidinger Tue, 03 Apr 2012 03:08:28 -0700

On Tue, 3 Apr 2012 11:58:16 +0200
Arvin Schnell <[email protected]> wrote:


> On Tue, Apr 03, 2012 at 11:33:09AM +0200, Klaus Kaempf wrote:
> > * Ladislav Slezak <[email protected]> [Apr 03. 2012 11:10]:
> 
> > > I used substring() to get one character. So the problematic call is 
> > > actually:
> > > 
> > >   substring("áa", 1, 1);
> > > 
> > > which returns "\0xF1" instead of "a" as I expected.
> > > 
> > > The documentation does not tell whether the substring() argument units 
> > > are in
> > > bytes or characters.
> > > http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/substring-rest.html
> > > 
> > > So any opinions on changing this call? Is the UTF-8 assumption also valid 
> > > here?
> > 
> > Yes. sub_string_ is operating on strings and strings are defined to be
> > UTF-8 encoded.
> 
> Generally I agree that strings in YCP are UTF-8 encoded and
> functions should respect this.
> 
> But simply fixing the functions might require converting from
> UTF-8 to wstring and back in every function and that sounds very
> costly. E.g. the size functions in YCP converts the string to
> wstring. When I noticed that and saw how many time
> size(string) == 0 is used I added an isempty function in YCP.
> 
> Could be that using wstring internally in YCPString is the better
> solution.
> 

I absolutelly agree. If we have each string as UTF string in ycp, then not 
using wstring doesn't make much sense to me. Of course we need to check which 
depends on it, but I think that it should be mainly various bindings. Other 
part of code should not be interested what is internal representation.

Josef

> Regards,
>   Arvin
> 

--
To unsubscribe, e-mail: [email protected]
To contact the owner, e-mail: [email protected]

Re: [yast-devel] Re: YCP substring() Was: YCP String operator [] and UTF-8

Reply via email to