[ http://issues.apache.org/jira/browse/XALANJ-2070?page=comments#action_60853 ] Brian Minchau commented on XALANJ-2070: ---------------------------------------
Yash and I were discussing my modification of bumping up end_chunk by one if the last character in the chunk is the first char of a high/low unicode surrogate pair, so that a high/low pair are handled in the same chunk. The idea is correct. I had suggested stress testing by setting BYTE_MAX to 6. This is also correct. However Yash found that the code had a stack overflow. I found that this was cause by code in WriterToUTF8Buffered that has this check in two places: if (lengthx3 >= BYTES_MAX) as its decicion for splitting into chunks. The solution to get rid of the stack overflow is to be little less pessamistic and replace those checks with: if (lengthx3 > BYTES_MAX) The idea of stress testing with BYTES_MAX with 3 is wrong. We should always process a surrogate pair together, and they need 4 bytes, so the logic of the code gets a bit twisted. Just stress test with BYTES_MAX of 6,... good enough. Two more comment on the code and the patch: - WriterToUTF8Buffered.write(int c) needs Yash's surrogate pair code it it, like is in the patch for write methods. - WriterToUTF8Buffered.write(String s) may need to check that the last char in its chunk is a high of a high/low surrogate. The splitting into chunks here should look more similar to the code in the other methods that splits, so the end_chunk can, except that if the last character is a high one in a high low pair we can't just bump up the end_chunk, rather we should reduce it by one and catch the pair on the next iteration. > Xalan should support XML 1.1 for input/output XML and stylesheets themselves > ----------------------------------------------------------------------------- > > Key: XALANJ-2070 > URL: http://issues.apache.org/jira/browse/XALANJ-2070 > Project: XalanJ2 > Type: New Feature > Components: Serialization, XSLTC, Xalan-interpretive > Versions: CurrentCVS > Reporter: Brian Minchau > Assignee: Yash Talwar > Fix For: CurrentCVS > Attachments: XML11SupportPatch.txt, XML11SupportPatch2.txt > > Xalan should have support for input XML documents that are XML 1.1, > for output XML documents that are 1.1, and for stylesheets that are > themselves XML 1.1 documents. > The serialization parameters should support XML 1.1: > <xsl:output method="xml" version="1.1" /> > An input XML document to a transformation should be supported: > <?xml version="1.1" ?> > Having a stylesheet that is itself an XML 1.1 document should be supported: > <?xml version="1.1" ?> > which means: > - write out XML 1.1 writing out NEL LSEP as the end-of-line sequence > - IRI support in namespaces, namespace URIs can now include character that > are according to the specification for an IRI (this is already there because > we aren't doing any checking). > - C0 and C1 range characters are now output as numeric character references. > However: > - undeclaration of namespaces shouldn't be done > - don't have character normalization like NEL or LSEP normalized to whitespace -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
