Elliotte Rusty Harold wrote:
> A W3C XML Schema Language validator needs a character based API to 
> correctly implement the minLength and maxLength facets on xsd:string 

As far as I understand, xsd:string is a list of "Character"-s, and a
"Character" is an integer which can hold any valid Unicode code point.

In other terms, xsd:string is necessarily in UTF-32 (or something close to
it): it cannot be in UTF-8 or UTF-16.

The numbers returned by length, minLength and maxLength are the actual,
minimum and maximum number of *list elements*, contained in the list. I.e.,
in the case of xsd:string, the *size* of the string in *encoding units*.

The fact that, in UTF-32, the *size* of the sting in encoding units
corresponds to the number of "characters" is coincidental.

In any case, the useful information is always the *size* of the string in
encoding units (octets for UTF-8, 16-bit units for UTF-16, etc.), not the
number of "characters" it contains.

_ Marco





Reply via email to