On Mon, Sep 26, 2005 at 02:18:28PM -0700, Loren Osborn wrote: > Daniel Veillard wrote: > > > UTF-8 makes certain assertions about how multi-byte characters are > > > represented. While this code change doesn't check all of those > > > assumptions, but it does ensure that all the non-first bytes have > their > > > high bits set correctly. This is likely to catch similar errors at > > > least regarding Latin characters. If you are feeling ambitious, > feel > > > free to check for the assertion that code-points are encoded in the > > > fewest number of bytes possible. This patch is untested, but I > prefer > > > that a developer more familiar with the libxml2 library give it a > more > > > thorough once over. > > > > that problem is that you add this check in one APIs. I am mot sure > > it make sense to do this on one entry point and not all the others. > > I am not sure it makes sense to add the checking to all tree APIs > > this could be extremely costly at runtime. > > Yes, I was expecting such a reaction, but I felt justified putting the > check where I did because there was already a correctness check there. I > simply refined it a bit. As far as whether this type of correctness > check be enforced on all entry-points is certainly an efficiency concern > that should be considered by libxml2's architects, but I simply wanted > to submit a code sample to demonstrate how this could be done.
Yes I appreciate that. There is something half baked in that function it makes sense to fix it, and on the other hand it's asymetric :-) I'm still uncertain about how to best do this. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
