DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7689>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7689

Performance problem with Xerces parser

           Summary: Performance problem with Xerces parser
           Product: Tomcat 4
           Version: 4.0.4 Beta 2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Minor
          Priority: Other
         Component: Unknown
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


I noticed a severe performance problem with the Xerces parser.  Tomcat was
taking longer to initialize after I started using Sun XMLPack/Spring02 for
XML parsing (includes Xerces 2.0.0).  Time went up in ~1,5s for a 1500MHz
machine (Windows 2000, JDK 1.4.0, Tomcat 4.0.4-beta2).  I investigated it
with OptimizeIt! and found that Xerces was spending virtually all of this
additional time by calling java.util.zip.InflaterInputStream.read() -- i.e.,
it's reading data from a compressed file (I guess Tomcat stores XML config
files in jars) one byte at a time, instead of using buffered reads, here:
XMLEntityManager.RewindableInputStream.read() is reading data without any
buffering.  This is usually slow in any stream, but particularly slow in
the zipped streams.  There is a lot of memory management / GC overhead, as
the InflaterInputStream.read() method is allocating 600,000 'byte[1]' objs.
Back to JDK1.4.0's built-in parser (Crimson), the performance is OK again
and the abnormal time and allocation behaviors disappear.  Xerces 2.0.1
behaves the same. The root of all evil seems to be places like:
org.apache.catalina.startup.ContextConfig.defaultConfig(XMLMapper) which
is clearly building an unbuffered FileInputStream and starting to parse
the XML data from this stream.  Maybe adding a buffered stream to this
pipeline wouldn't hurt here, and help a lot when using Xerces.  I wonder
if the time savings wouldn't be very important for complex Tomcat setups
(my config is extremely simple, basically the default Tomcat install).

I believe the problem doesn't happen with older parsers like Crimson
because these parsers may extract the data from compressed files before
parsing it, or maybe they add some buffer streams in the pipeline, while
Xerces2 is obviously not doing any; each single byte is read individually
and pays the full I/O pipeline overhead.

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to