I have a server app that parsers millions of smallish documents.
Performance has been improved at lot by reusing XMLReaders. It's pretty good but could perhaps get better when studying the (perhaps dubious?) hints given by the java -server -Xprof snippet below (JDK 1.5 RC, xerces CVS head, not using the JDK internal xerces which appears to be twice as slow in this case, for whatever reason).
Accordingly, the theory is that throwing an (artifical) EOFException in XMLEntityScanner.load() at the end of each document consumes some 25% of the total execution time. Probably due too the heavy nature of exceptions and in particular Throwable.fillInStackTrace(). Would it perhaps be possibly (and correct) to avoid raising artificial exceptions for what appears to be normal program control flow (the documents and streams are fine)?
Here is the trace snippet:
Stub + native Method 28.6% 0 + 487 java.lang.Throwable.fillInStackTrace 28.6% 0 + 487 Total stub
Thread-local ticks: 0.1% 1 Blocked (of total) 0.1% 2 Class loader 0.1% 2 Compilation 0.2% 3 Unknown: thread_state
Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
Thread-local ticks: 100.0% 1 Blocked (of total)
Global summary of 35.44 seconds: 100.0% 1718 Received ticks 0.7% 12 Received GC ticks 9.7% 167 Compilation 0.1% 2 Class loader 0.2% 3 Unknown code
real 0m35.715s user 0m34.170s sys 0m0.190s
TRACE 300347:
java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
java.lang.Throwable.<init>(Throwable.java:181)
java.lang.Exception.<init>(Exception.java:29)
java.io.IOException.<init>(IOException.java:28)
java.io.EOFException.<init>(EOFException.java:32)
org.apache.xerces.impl.XMLEntityScanner.load(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dis patch(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Unkn own Source>:Unknown line)
org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.XMLParser.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown Source>:Unknown line)
nu.xom.Builder.build(Builder.java:786)
nu.xom.Builder.build(Builder.java:569)
gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
I guess the relevant block is
XMLEntityScanner.load(...):
...
if (changeEntity) {
fEntityManager.endEntity();
if (fCurrentEntity == null) {
throw new EOFException();
}
// handle the trailing edges
if (fCurrentEntity.position == fCurrentEntity.count) {
load(0, true);
}
}
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
