On Tue, Apr 03, 2012 at 10:06:18AM +0200, Ladislav Slezak wrote:
> Dne 3.4.2012 09:37, Johannes Meixner napsal(a):
>
> >Could a YCP expert show the correct way (example code)
> >how to remove an UTF8 sub-string from an UTF8 string?
> >I.e. how to remove "Bar" from "FooBarBaz" in an UTF8-safe way?
> >
> >More generally:
> >Is there documentation how to work on UTF8 strings with YCP in general?
>
> Um, it seems that the documentation does not mention the [] behavior for
> string
> (http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/bracket.html nor
> http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/id_ycp_data_string.html).
>
> I fixed that particular bug by using regexpsub() instead of iterating over
> the string
> and using [] operator. So I guess the regexp*() functions are UTF-8 safe.
>
> I'm not sure what the correct solution is. Maybe the correct way is to fix
> the []
> operator after all... That's why I have opened this discussion, because
> maybe someone is relaying on the current "buggy" behavior... (I don't
> expect that
> but I'd like to avoid regressions if possible.)
Unfortunately this can be the case since most other string
functions also do not respect UTF-8. E.g. splitting a string at a
space by using search and substring works correctly with
substring since search is also byte-oriented.
Program:
string s = "schöner Würfel";
integer i = search(s, " ");
y2milestone("substring '%1' '%2'", substring(s, 0, i), substring(s, i + 1));
y2milestone("lsubstring '%1' '%2'", lsubstring(s, 0, i), lsubstring(s, i +
1));
Output:
test1.ycp:6 substring 'schöner' 'Würfel'
test1.ycp:7 lsubstring 'schöner ' 'ürfel'
So, if substring is fixed all other functions must also.
Regards,
Arvin
--
Arvin Schnell, <[email protected]>
Senior Software Engineer, Research & Development
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
--
To unsubscribe, e-mail: [email protected]
To contact the owner, e-mail: [email protected]