Terrible performance from ToStream.startPrefixMapping calling flush()
repeatedly while serializing XML
------------------------------------------------------------------------------------------------------
Key: XALANJ-2500
URL: https://issues.apache.org/jira/browse/XALANJ-2500
Project: XalanJ2
Issue Type: Bug
Security Level: No security risk; visible to anyone (Ordinary problems in
Xalan projects. Anybody can view the issue.)
Components: Serialization
Affects Versions: 2.7.1
Environment: N/A (Any)
Reporter: Mark A. Ziesemer
As discussed in XALANJ-78, flush() is only to be called from endDocument().
However, the .startPrefixMapping method being called in ToStream is always
calling "flushPending()", which among other things, calls "m_writer.flush()".
Here is some relevant stack trace, along with fully-qualified class names:
org.apache.xml.serializer.WriterToUTF8Buffered.flush(WriterToUTF8Buffered.java:467)
at org.apache.xml.serializer.ToStream.flushPending(ToStream.java:2975)
at
org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2340)
at
org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2299)
at
org.apache.xalan.transformer.TransformerIdentityImpl.startPrefixMapping(TransformerIdentityImpl.java:985)
at org.apache.xml.serializer.TreeWalker.startNode(TreeWalker.java:317)
at org.apache.xml.serializer.TreeWalker.traverse(TreeWalker.java:145)
at
org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:390)
at ...
Note that it seems that some use of XML namespaces is required for this to be
an issue. However, this does not necessarily mean that there are XML
namespaces in the output document. Where I first ran into this is with an XSL
that utilizes XML namespaces for parameter names, but the generated document is
completely within the default namespace.
Below is a sample test-case that demonstrates the issue, in which flush() is
called 103 times. 1 time for each element serialized containing an XML
namespace, and 3 times for the end of the document: When writing to
high-latency outputs e.g. a remote web client, the result is a severe
performance issue.
import java.io.IOException;
import java.io.OutputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class JavaTest{
public static void main(String[] args) throws Exception{
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.newDocument();
Element root = doc.createElement("Root");
doc.appendChild(root);
for(int i=0; i<100; i++){
Element child =
doc.createElementNS("http://test.example.com", "Child" + i);
root.appendChild(child);
}
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
OutputStream os = new OutputStream(){
protected int flushCount = 0;
@Override
public void write(int b) throws IOException{
// Do nothing - this is just a minimal test
case.
}
@Override
public void flush() throws IOException{
new Throwable("flushed #" +
(++flushCount)).printStackTrace();
}
};
t.transform(new DOMSource(doc), new StreamResult(os));
}
}
Using a Writer instead of an OutputStream results in the same issue, where
flush() is called repeatedly on the Writer instead.
The only known work-around is to write and use an overridden implementation of
the OutputStream or Writer where flush() is effectively caught and ignored.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]