[ 
http://issues.apache.org/jira/browse/XALANJ-2070?page=comments#action_60853 ]
     
Brian Minchau commented on XALANJ-2070:
---------------------------------------

Yash and I were discussing my modification of bumping up end_chunk by one if 
the last character in the chunk is the first char of a high/low unicode 
surrogate pair, so that a high/low pair are handled in the same chunk.

The idea is correct. I had suggested stress testing by setting BYTE_MAX to 6.  
This is also correct. However Yash found that the code had a stack overflow.  I 
found that this was cause by code in WriterToUTF8Buffered that has this check 
in two places:
  if (lengthx3 >= BYTES_MAX)
as its decicion for splitting into chunks. The solution to get rid of the stack 
overflow is to be little less pessamistic and replace those checks with:
  if (lengthx3 > BYTES_MAX)

The idea of stress testing with BYTES_MAX with 3 is wrong. We should always 
process a surrogate pair together, and they need 4 bytes, so the logic of the 
code gets a bit twisted.  Just stress test with BYTES_MAX of 6,... good enough.

Two more comment on the code and the patch:

- WriterToUTF8Buffered.write(int c) needs Yash's surrogate pair code it it, 
like is in the patch for write methods.

- WriterToUTF8Buffered.write(String s) may need to check that the last char in 
its chunk is a high of a high/low surrogate. The splitting into chunks here 
should look more similar to the code in the other methods that splits, so the 
end_chunk can, except that if the last character is a high one in a high low 
pair we can't just bump up the end_chunk, rather we should reduce it by one and 
catch the pair on the next iteration. 

> Xalan should support  XML 1.1 for input/output XML and stylesheets themselves
> -----------------------------------------------------------------------------
>
>          Key: XALANJ-2070
>          URL: http://issues.apache.org/jira/browse/XALANJ-2070
>      Project: XalanJ2
>         Type: New Feature
>   Components: Serialization, XSLTC, Xalan-interpretive
>     Versions: CurrentCVS
>     Reporter: Brian Minchau
>     Assignee: Yash Talwar
>      Fix For: CurrentCVS
>  Attachments: XML11SupportPatch.txt, XML11SupportPatch2.txt
>
> Xalan should have support for input XML documents that are XML 1.1,
> for output XML documents that are 1.1, and for stylesheets that are 
> themselves XML 1.1 documents.
> The serialization parameters should support XML 1.1:
> <xsl:output method="xml" version="1.1" />
> An input XML document to a transformation should be supported:
> <?xml version="1.1" ?>
> Having a stylesheet that is itself an XML 1.1 document should be supported:
> <?xml version="1.1" ?>
> which means:
> - write out XML 1.1 writing out NEL LSEP as the end-of-line sequence
> - IRI support in namespaces, namespace URIs can now include character that 
> are according to the specification for an IRI (this is already there because 
> we aren't doing any checking).
> - C0 and C1 range characters are now output as numeric character references.
> However:
> - undeclaration of namespaces shouldn't be done
> - don't have character normalization like NEL or LSEP normalized to whitespace

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to