[ http://issues.apache.org/jira/browse/XALANJ-2271?page=comments#action_12367692 ]
Brian Minchau commented on XALANJ-2271: --------------------------------------- This issue was more difficult than I thought. The character expansion code in the serializer has been getting better over time but is still complicated. The CharInfo changes do the following: 1. Previously the CharInfo object for HTML,TEXT and XML were all cached in a static Hashtable. Seems good for performance, but the downside of this was that the CharInfo's getOutputStringForChar(char) method, that returned the entity for a given char was synchronized (e.g. map '<' to "<"). When generating HTML, which has lots of entities coming from the HTMLEntities.properties file, in a webserver this can be a bottleneck on a busy server. The changes were to make each CharInfo object returned to the caller a mutable copy and not require synchronization any more. Some Hashtables were changed to HashMap for performance. Previously this isSpecialAttrChar() said that a lot of other characters were special, but now it is related only to entities. Changes to isSpecialAttrChar() and isSpecialTextChar(). Basically these routines return true if there is an entity for character. However there is some internal tweaking to: > output a literal tab as "	" in XML attribute values > output a quote in an XML attribute as """ > leave a literal quote as-is in HTML or XML text nodes > output less than sign as-is in HTML attribute values 2. Changes to ToStream method characters(final char chars[], final int start, final int length) is reworked in an effient way to cover characters in the C0 and C1 range to be written out as character references (except for tab, newline, carriage return). Also the line-separator 0x2028 will be written out as a character reference. This processing is done regardless of the XML version (1.0 or 1.1) but is good for XML 1.0 also, just in case it is is included as a generally parsed entity in an XML 1.1 file. 3. Changes to ToStream method writeAttrString() 4. Minor changes to ToXMLStream and ToHTMLStream to make the CharInfo object used to check for entities non-static, but one owned by that serializer, which drops the need for synchronization when looking up entities. > XML 1.1 Serialization, char in attribute value not escaped > ----------------------------------------------------------- > > Key: XALANJ-2271 > URL: http://issues.apache.org/jira/browse/XALANJ-2271 > Project: XalanJ2 > Type: Bug > Reporter: Brian Minchau > Attachments: character.expansion.patch1.txt > > This issue was found by Henry Zongaro. > If you try the following stylesheet, you'll see that the character x8C, which > is not permitted in literal form in XML 1.1, is escaped when it appears in an > element's character content, but it's not escaped when it is part of an > attribute value. > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="1.0"> > <xsl:output method="xml" version="1.1"/> > <xsl:template match="/"> > <out att="Œ">Œ</out> > </xsl:template> > </xsl:stylesheet> > When the serialized XML produced by this stylesheet is parsed by Xerces > (depending perhaps on the version of Xerces) it goes into an infinite loop > when it attempts to parse an attribute that contains an invalid character. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
