[Bug 2882] New: - [PERFORMANCE] Why SAX parser needs DOM classes?

bugzilla Mon, 30 Jul 2001 02:40:42 -0700
PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT
ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW
AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE
DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL
BE LOST SOMEWHERE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2882

*** shadow/2882 Mon Jul 30 03:06:23 2001
--- shadow/2882.tmp.21193       Mon Jul 30 03:06:23 2001
***************
*** 0 ****
--- 1,165 ----
+ +============================================================================+
+ | [PERFORMANCE] Why SAX parser needs DOM classes?                            |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 2882                        Product: Xerces-J                |
+ |       Status: NEW                         Version: 1.3.0                   |
+ |   Resolution:                            Platform: Other                   |
+ |     Severity: Normal                   OS/Version: Other                   |
+ |     Priority: Other                     Component: Core                    |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: [EMAIL PROTECTED]                                  |
+ |  Reported By: [EMAIL PROTECTED]                                               |
+ |      CC list: Cc:                                                          |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ Petr Kuzel wrote:
+ 
+ > 
+ > Sandeep Randhawa wrote:
+ > 
+ 
+ > > You don't have to be parser dependent.
+ > >
+ > > Set the feature by calling
+ > > XMLReader.setFeature("http://apache.org/xml/features/nonvalidating/load-exte
+ > > rnal-dtd", false);
+ 
+ > 
+ > Other implementations may throw an Exception as they does not support the
+ > feature by its name. It can be catched and hidden.
+ > 
+ > Anyway I dislike the idea of turning off validation. Xerces should come with
+ > cached grammars with near future eliminating reparsing known DTDs.
+ 
+ 
+ What the hell you'd like to use validation for, when you're reading a
+ file
+ from inside a distribution jar that never changes again and again?
+ 
+ And even chached grammars won't help Xerces read the little documents
+ faster,
+ because it still needs to prepare the instance of the grammar and then
+ check
+ the elements against the grammar.
+ 
+ Try to tell Dafe to use validation for his window management stuff,
+ where he is using files with about two elements in them and even
+ the DOCTYPE declaration is bigger that the content.
+ 
+ The problem with the Xerces is that it uses a part of DOM (even in SAX
+ parsing),
+ whenever in ecounters DOCTYPE declaration. I understand that it needs
+ it in the case the document will have internal DTD, but how many
+ of our XMLs uses this feature? I think that it could wait with creating
+ the Document till it finds '[' inside the DOCTYPE, otherwise it is
+ overkill
+ for lightweight SAX parsing.
+ 
+ I just looked what the Xerces loads when it encouters DOCTYPE,
+ I'm not saying that the loads are bad (the clacces will be loaded
+ a bit later anyway), but it signs that there is something wrong
+ with it to me:
+ [Loaded org.openide.filesystems.StreamPool]
+ [Loaded org.openide.filesystems.StreamPool$NotifyInputStream]
+ [Loaded org.xml.sax.InputSource]
+ [Loaded org.openide.filesystems.FileURL]
+ [Loaded org.openide.filesystems.FileURL$1]
+ [Loaded org.netbeans.core.modules.ModuleList$1]
+ [Loaded org.apache.xerces.readers.DefaultEntityHandler$ReaderState]
+ [Loaded org.apache.xerces.readers.XMLEntityHandler$EntityReader]
+ [Loaded org.apache.xerces.readers.DefaultEntityHandler$NullReader]
+ [Loaded org.apache.xerces.utils.URI]
+ [Loaded sun.io.ByteToCharUTF8 from /pn/Jdks/jdk1.3.1-fcs/jre/lib/rt.jar]
+ [Loaded org.apache.xerces.readers.XMLEntityReader]
+ [Loaded org.apache.xerces.readers.AbstractCharReader]
+ [Loaded org.apache.xerces.readers.CharReader]
+ [Loaded org.apache.xerces.utils.XMLCharacterProperties]
+ [Loaded org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher]
+ [Loaded org.apache.xerces.framework.XMLDTDScanner]
+ [Loaded org.apache.xerces.framework.XMLContentSpec$Provider]
+ [Loaded org.apache.xerces.validators.common.Grammar]
+ [Loaded org.apache.xerces.framework.XMLDTDScanner$EventHandler]
+ [Loaded org.apache.xerces.validators.dtd.DTDGrammar]
+ [Loaded org.apache.xerces.validators.datatype.DatatypeValidator]
+ [Loaded org.apache.xerces.validators.common.XMLContentModel]
+ [Loaded org.apache.xerces.utils.Hash2intTable]
+ [Loaded org.apache.xerces.framework.XMLContentSpec]
+ [Loaded org.w3c.dom.Node]
+ [Loaded org.w3c.dom.NodeList]
+ [Loaded org.w3c.dom.events.EventTarget]
+ [Loaded org.apache.xerces.dom.NodeImpl]
+ [Loaded org.apache.xerces.dom.ChildNode]
+ [Loaded org.apache.xerces.dom.ParentNode]
+ [Loaded org.w3c.dom.Document]
+ [Loaded org.w3c.dom.traversal.DocumentTraversal]
+ [Loaded org.w3c.dom.events.DocumentEvent]
+ [Loaded org.w3c.dom.ranges.DocumentRange]
+ [Loaded org.apache.xerces.dom.DocumentImpl]
+ [Loaded org.w3c.dom.Element]
+ [Loaded org.apache.xerces.dom.ElementImpl]
+ [Loaded org.w3c.dom.Attr]
+ [Loaded org.apache.xerces.dom.AttrImpl]
+ [Loaded org.w3c.dom.NamedNodeMap]
+ [Loaded org.apache.xerces.dom.NamedNodeMapImpl]
+ [Loaded org.apache.xerces.dom.AttributeMap]
+ [Loaded
+ org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher]
+ [Loaded org.apache.xerces.utils.StringPool$CharArrayRange]
+ [Loaded
+ org.apache.xerces.framework.XMLDocumentScanner$TrailingMiscDispatcher]
+ [Loaded
+ org.apache.xerces.framework.XMLDocumentScanner$EndOfInputDispatcher]
+ [Loaded org.netbeans.core.modules.ModuleHistory]
+ [Loaded org.netbeans.core.modules.ModuleList$DiskStatus]
+ 
+ and the file parsed:
+ <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+ <!DOCTYPE module PUBLIC "-//NetBeans//DTD Module Status 1.0//EN"
+                        
+ "http://www.netbeans.org/dtds/module-status-1_0.dtd";>
+ <module name="org.netbeans.modules.apisupport.lite">
+ <param name="autoload">false</param>
+ <param name="enabled">true</param>
+ <param name="jar">apisupport-lite.jar</param>
+ <param name="origin">installation</param>
+ <param name="release">1</param>
+ <param name="reloadable">false</param>
+ <param name="specversion">0.2</param>
+ </module>
+ 
+ I was parsing using simple SAX parser (not the DOM in your sources of
+ ModuleList)
+ with the above feature set to false and validation also turned off.
+ 
+ The second thing I don't understand is why I have to switch some
+ proprietary
+ feature to prevent Xercer from asking for DTD when the document itself
+ specifies
+ it is standalone and I'm not validating.
+ Can somebody explain me this behaviour?
+ 
+ By the way, changing the switch didn't make a difference in its speed,
+ because the additional time was not spent analyzing the DTD (we've
+ already returned empty stream when not validating), but rather
+ creating the instance of the garmmar.
+ 
+ 
+ 
+ > Then we have to setup feature by calling something similar anyway.
+ > 
+ > Petr you can try Crimson however expect compatability problems as Xerces
+ > is exposed at IDE classpath and someone may be implementation dependend.
+ > It is real pain to have at IDE classpath implementations that are widely
+ > visible. (We can not remove parser.jar for the same reason.)
+ > Cc.
+ > 
+ > --
+ > <address>
+ > <a href="mailto:[EMAIL PROTECTED]";>Petr Kuzel</a>, Sun Microsystems
+ > : <a href="http://www.sun.com/forte/ffj/ie/";>Forte Tools</a>
+ > : XML and <a href="http://jini.netbeans.org/";>Jini</a> modules</address>
+ 
+ 
+ -- Petr Nejedly NetBeans/Sun Microsystems http://www.netbeans.org

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
[Bug 2882] New: - [PERFORMANCE] Why SAX parser needs DOM classes?

Reply via email to