On Mon, Sep 26, 2005 at 02:18:28PM -0700, Loren Osborn wrote:
> Daniel Veillard wrote:
> > > UTF-8 makes certain assertions about how multi-byte characters are
> > > represented.  While this code change doesn't check all of those
> > > assumptions, but it does ensure that all the non-first bytes have
> their
> > > high bits set correctly.  This is likely to catch similar errors at
> > > least regarding Latin characters.  If you are feeling ambitious,
> feel
> > > free to check for the assertion that code-points are encoded in the
> > > fewest number of bytes possible.  This patch is untested, but I
> prefer
> > > that a developer more familiar with the libxml2 library give it a
> more
> > > thorough once over. 
> >
> >   that problem is that you add this check in one APIs. I am mot sure
> > it make sense to do this on one entry point and not all the others.
> > I am not sure it makes sense to add the checking to all tree APIs
> > this could be extremely costly at runtime.
> 
> Yes, I was expecting such a reaction, but I felt justified putting the
> check where I did because there was already a correctness check there. I
> simply refined it a bit.  As far as whether this type of correctness
> check be enforced on all entry-points is certainly an efficiency concern
> that should be considered by libxml2's architects, but I simply wanted
> to submit a code sample to demonstrate how this could be done.

  Yes I appreciate that. There is something half baked in that function
it makes sense to fix it, and on the other hand it's asymetric :-)
I'm still uncertain about how to best do this.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to