DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7689>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7689 Performance problem with Xerces parser Summary: Performance problem with Xerces parser Product: Tomcat 4 Version: 4.0.4 Beta 2 Platform: All OS/Version: All Status: NEW Severity: Minor Priority: Other Component: Unknown AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I noticed a severe performance problem with the Xerces parser. Tomcat was taking longer to initialize after I started using Sun XMLPack/Spring02 for XML parsing (includes Xerces 2.0.0). Time went up in ~1,5s for a 1500MHz machine (Windows 2000, JDK 1.4.0, Tomcat 4.0.4-beta2). I investigated it with OptimizeIt! and found that Xerces was spending virtually all of this additional time by calling java.util.zip.InflaterInputStream.read() -- i.e., it's reading data from a compressed file (I guess Tomcat stores XML config files in jars) one byte at a time, instead of using buffered reads, here: XMLEntityManager.RewindableInputStream.read() is reading data without any buffering. This is usually slow in any stream, but particularly slow in the zipped streams. There is a lot of memory management / GC overhead, as the InflaterInputStream.read() method is allocating 600,000 'byte[1]' objs. Back to JDK1.4.0's built-in parser (Crimson), the performance is OK again and the abnormal time and allocation behaviors disappear. Xerces 2.0.1 behaves the same. The root of all evil seems to be places like: org.apache.catalina.startup.ContextConfig.defaultConfig(XMLMapper) which is clearly building an unbuffered FileInputStream and starting to parse the XML data from this stream. Maybe adding a buffered stream to this pipeline wouldn't hurt here, and help a lot when using Xerces. I wonder if the time savings wouldn't be very important for complex Tomcat setups (my config is extremely simple, basically the default Tomcat install). I believe the problem doesn't happen with older parsers like Crimson because these parsers may extract the data from compressed files before parsing it, or maybe they add some buffer streams in the pipeline, while Xerces2 is obviously not doing any; each single byte is read individually and pays the full I/O pipeline overhead. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>