PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL BE LOST SOMEWHERE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2882 *** shadow/2882 Mon Jul 30 03:06:23 2001 --- shadow/2882.tmp.21193 Mon Jul 30 03:06:23 2001 *************** *** 0 **** --- 1,165 ---- + +============================================================================+ + | [PERFORMANCE] Why SAX parser needs DOM classes? | + +----------------------------------------------------------------------------+ + | Bug #: 2882 Product: Xerces-J | + | Status: NEW Version: 1.3.0 | + | Resolution: Platform: Other | + | Severity: Normal OS/Version: Other | + | Priority: Other Component: Core | + +----------------------------------------------------------------------------+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + | CC list: Cc: | + +----------------------------------------------------------------------------+ + | URL: | + +============================================================================+ + | DESCRIPTION | + Petr Kuzel wrote: + + > + > Sandeep Randhawa wrote: + > + + > > You don't have to be parser dependent. + > > + > > Set the feature by calling + > > XMLReader.setFeature("http://apache.org/xml/features/nonvalidating/load-exte + > > rnal-dtd", false); + + > + > Other implementations may throw an Exception as they does not support the + > feature by its name. It can be catched and hidden. + > + > Anyway I dislike the idea of turning off validation. Xerces should come with + > cached grammars with near future eliminating reparsing known DTDs. + + + What the hell you'd like to use validation for, when you're reading a + file + from inside a distribution jar that never changes again and again? + + And even chached grammars won't help Xerces read the little documents + faster, + because it still needs to prepare the instance of the grammar and then + check + the elements against the grammar. + + Try to tell Dafe to use validation for his window management stuff, + where he is using files with about two elements in them and even + the DOCTYPE declaration is bigger that the content. + + The problem with the Xerces is that it uses a part of DOM (even in SAX + parsing), + whenever in ecounters DOCTYPE declaration. I understand that it needs + it in the case the document will have internal DTD, but how many + of our XMLs uses this feature? I think that it could wait with creating + the Document till it finds '[' inside the DOCTYPE, otherwise it is + overkill + for lightweight SAX parsing. + + I just looked what the Xerces loads when it encouters DOCTYPE, + I'm not saying that the loads are bad (the clacces will be loaded + a bit later anyway), but it signs that there is something wrong + with it to me: + [Loaded org.openide.filesystems.StreamPool] + [Loaded org.openide.filesystems.StreamPool$NotifyInputStream] + [Loaded org.xml.sax.InputSource] + [Loaded org.openide.filesystems.FileURL] + [Loaded org.openide.filesystems.FileURL$1] + [Loaded org.netbeans.core.modules.ModuleList$1] + [Loaded org.apache.xerces.readers.DefaultEntityHandler$ReaderState] + [Loaded org.apache.xerces.readers.XMLEntityHandler$EntityReader] + [Loaded org.apache.xerces.readers.DefaultEntityHandler$NullReader] + [Loaded org.apache.xerces.utils.URI] + [Loaded sun.io.ByteToCharUTF8 from /pn/Jdks/jdk1.3.1-fcs/jre/lib/rt.jar] + [Loaded org.apache.xerces.readers.XMLEntityReader] + [Loaded org.apache.xerces.readers.AbstractCharReader] + [Loaded org.apache.xerces.readers.CharReader] + [Loaded org.apache.xerces.utils.XMLCharacterProperties] + [Loaded org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher] + [Loaded org.apache.xerces.framework.XMLDTDScanner] + [Loaded org.apache.xerces.framework.XMLContentSpec$Provider] + [Loaded org.apache.xerces.validators.common.Grammar] + [Loaded org.apache.xerces.framework.XMLDTDScanner$EventHandler] + [Loaded org.apache.xerces.validators.dtd.DTDGrammar] + [Loaded org.apache.xerces.validators.datatype.DatatypeValidator] + [Loaded org.apache.xerces.validators.common.XMLContentModel] + [Loaded org.apache.xerces.utils.Hash2intTable] + [Loaded org.apache.xerces.framework.XMLContentSpec] + [Loaded org.w3c.dom.Node] + [Loaded org.w3c.dom.NodeList] + [Loaded org.w3c.dom.events.EventTarget] + [Loaded org.apache.xerces.dom.NodeImpl] + [Loaded org.apache.xerces.dom.ChildNode] + [Loaded org.apache.xerces.dom.ParentNode] + [Loaded org.w3c.dom.Document] + [Loaded org.w3c.dom.traversal.DocumentTraversal] + [Loaded org.w3c.dom.events.DocumentEvent] + [Loaded org.w3c.dom.ranges.DocumentRange] + [Loaded org.apache.xerces.dom.DocumentImpl] + [Loaded org.w3c.dom.Element] + [Loaded org.apache.xerces.dom.ElementImpl] + [Loaded org.w3c.dom.Attr] + [Loaded org.apache.xerces.dom.AttrImpl] + [Loaded org.w3c.dom.NamedNodeMap] + [Loaded org.apache.xerces.dom.NamedNodeMapImpl] + [Loaded org.apache.xerces.dom.AttributeMap] + [Loaded + org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher] + [Loaded org.apache.xerces.utils.StringPool$CharArrayRange] + [Loaded + org.apache.xerces.framework.XMLDocumentScanner$TrailingMiscDispatcher] + [Loaded + org.apache.xerces.framework.XMLDocumentScanner$EndOfInputDispatcher] + [Loaded org.netbeans.core.modules.ModuleHistory] + [Loaded org.netbeans.core.modules.ModuleList$DiskStatus] + + and the file parsed: + <?xml version="1.0" encoding="UTF-8" standalone="yes"?> + <!DOCTYPE module PUBLIC "-//NetBeans//DTD Module Status 1.0//EN" + + "http://www.netbeans.org/dtds/module-status-1_0.dtd"> + <module name="org.netbeans.modules.apisupport.lite"> + <param name="autoload">false</param> + <param name="enabled">true</param> + <param name="jar">apisupport-lite.jar</param> + <param name="origin">installation</param> + <param name="release">1</param> + <param name="reloadable">false</param> + <param name="specversion">0.2</param> + </module> + + I was parsing using simple SAX parser (not the DOM in your sources of + ModuleList) + with the above feature set to false and validation also turned off. + + The second thing I don't understand is why I have to switch some + proprietary + feature to prevent Xercer from asking for DTD when the document itself + specifies + it is standalone and I'm not validating. + Can somebody explain me this behaviour? + + By the way, changing the switch didn't make a difference in its speed, + because the additional time was not spent analyzing the DTD (we've + already returned empty stream when not validating), but rather + creating the instance of the garmmar. + + + + > Then we have to setup feature by calling something similar anyway. + > + > Petr you can try Crimson however expect compatability problems as Xerces + > is exposed at IDE classpath and someone may be implementation dependend. + > It is real pain to have at IDE classpath implementations that are widely + > visible. (We can not remove parser.jar for the same reason.) + > Cc. + > + > -- + > <address> + > <a href="mailto:[EMAIL PROTECTED]">Petr Kuzel</a>, Sun Microsystems + > : <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a> + > : XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address> + + + -- Petr Nejedly NetBeans/Sun Microsystems http://www.netbeans.org --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
