Terrible performance from ToStream.startPrefixMapping calling flush() 
repeatedly while serializing XML
------------------------------------------------------------------------------------------------------

                 Key: XALANJ-2500
                 URL: https://issues.apache.org/jira/browse/XALANJ-2500
             Project: XalanJ2
          Issue Type: Bug
      Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
          Components: Serialization
    Affects Versions: 2.7.1
         Environment: N/A (Any)
            Reporter: Mark A. Ziesemer


As discussed in XALANJ-78, flush() is only to be called from endDocument().  
However, the .startPrefixMapping method being called in ToStream is always 
calling "flushPending()", which among other things, calls "m_writer.flush()".

Here is some relevant stack trace, along with fully-qualified class names:

org.apache.xml.serializer.WriterToUTF8Buffered.flush(WriterToUTF8Buffered.java:467)
        at org.apache.xml.serializer.ToStream.flushPending(ToStream.java:2975)
        at 
org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2340)
        at 
org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2299)
        at 
org.apache.xalan.transformer.TransformerIdentityImpl.startPrefixMapping(TransformerIdentityImpl.java:985)
        at org.apache.xml.serializer.TreeWalker.startNode(TreeWalker.java:317)
        at org.apache.xml.serializer.TreeWalker.traverse(TreeWalker.java:145)
        at 
org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:390)
        at ...

Note that it seems that some use of XML namespaces is required for this to be 
an issue.  However, this does not necessarily mean that there are XML 
namespaces in the output document.  Where I first ran into this is with an XSL 
that utilizes XML namespaces for parameter names, but the generated document is 
completely within the default namespace.

Below is a sample test-case that demonstrates the issue, in which flush() is 
called 103 times.  1 time for each element serialized containing an XML 
namespace, and 3 times for the end of the document:  When writing to 
high-latency outputs e.g. a remote web client, the result is a severe 
performance issue.

import java.io.IOException;
import java.io.OutputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class JavaTest{
        public static void main(String[] args) throws Exception{
                DocumentBuilderFactory dbf = 
DocumentBuilderFactory.newInstance();
                DocumentBuilder db = dbf.newDocumentBuilder();
                Document doc = db.newDocument();
                
                Element root = doc.createElement("Root");
                doc.appendChild(root);
                for(int i=0; i<100; i++){
                        Element child = 
doc.createElementNS("http://test.example.com";, "Child" + i);
                        root.appendChild(child);
                }
                
                TransformerFactory tf = TransformerFactory.newInstance();
                Transformer t = tf.newTransformer();
                
                OutputStream os = new OutputStream(){
                        protected int flushCount = 0;
                        
                        @Override
                        public void write(int b) throws IOException{
                                // Do nothing - this is just a minimal test 
case.
                        }
                        
                        @Override
                        public void flush() throws IOException{
                                new Throwable("flushed #" + 
(++flushCount)).printStackTrace();
                        }
                };
                
                t.transform(new DOMSource(doc), new StreamResult(os));
        }
}

Using a Writer instead of an OutputStream results in the same issue, where 
flush() is called repeatedly on the Writer instead.

The only known work-around is to write and use an overridden implementation of 
the OutputStream or Writer where flush() is effectively caught and ignored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to