I'm not sure to understand everything, but the given link doesn't deal with the case where an entity should be translated to 2 Unicode characters, instead of only one as it is the case with the current hash table system.
Such 2 characters entities don't exist in the HTML 5 entity list, but some are present in the one used by MathML 3 (link in my previous message). François Sausset Le 10 juil. 2010 à 21:17, Adam Barth a écrit : > On Sat, Jul 10, 2010 at 11:10 AM, Sausset François <[email protected]> wrote: >> I just saw that when looking at the code by myself. >> What do you exactly mean by a prefix tree? > > http://en.wikipedia.org/wiki/Trie > >> I also noticed that the entity parser does not take into account combined >> Unicode characters (see §A.3 in: http://www.w3.org/TR/xml-entity-names/). >> In addition, even without entities, combined characters are displayed as >> separate ones. > > My understanding is that is the correct behavior w.r.t. the HTML5 > specification of entity parsing. Our entity processing aims for > perfect compliance with this algorithm: > > http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references > > My belief is the only things we're missing for perfect compliance is > the expanded list of entity names: > > http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references > > and the prefix tree. > > Adam _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

