> Ok, yes if you go through the parser and you get it out in XMLCh format,
> then you could do a trivial truncation to get 8859-1. But, you'd have to
> scan the entire outgoing contents to figure out whether you could do it.
If
> there is a lot of source, if you add up that extra overhead, and the fact
> that you know have to do a switch inside every operation inside DOMString,
> that's kind of robbing Peter (performance) to pay Paul (memory.)
Yep, there is no free lunch.
> Also, if you did that on a per-callback basis, you might find on some that
> you have to store it in UTF-8 because it has > 255 code points, and others
> that don't. So you'd have some DOMStrings that are UTF-8 and some in
8859-1.
> What happens when you have to compare them and such, as you'd probably do
a
> LOT of in a complicated XSL transformation on a large file?
Actually DStringPool's implementation already makes heavy
(if not excessive) use of DOMString.compare(). It does increase the
complexity
of DOMString.compare since there has to be a support for comparison of mixed
representations, but it doesn't have a significant performance effect since
the same
encoding case gets executed the vast majority of the time.
> Wouldn't you be better off just unconditionally using UTF-8, at least in
the
> scheme you are trying to implement, which I do not at all advocate for the
> DOM in general.
UTF-8 has its drawbacks that it is fatter for European-centric files and
that
UTF-16 character offsets can't be directly converted to byte offsets.
Definitely
the more potential encodings used internally by DOMString(), the more code
paths need to be written and checked. Actually, if I had to get rid of one
within
the palette of encodings used within my DOMString(), it would probably be
UTF-8
since the only time it adds value is when you have a long string that are
primarily
USASCII or ISO-8859-1 with a few code points higher than 255.
Of course, that would reinforce my original interest in having ICU helper
functions
that convert from wchar_t to ISO-8859-1 char* efficiently.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]