Re: Performance of Xerces & Xalan

Bhushan_Bhangale Wed, 15 May 2002 00:43:21 -0700

DOM v/s SAX

There are two major types of XML  APIs namely tree-based APIs and event-based APIs.


A tree-based API (e.g. DOM )translates an XML document into an internal tree structure 
and then allows an application to navigate that tree structure. So, DOM gives access 
to the information stored in an XML as a hierarchical object model. That is to say, a 
DOM parser will process XML data and create an object-oriented hierarchical 
representation of it that can be traversed and/or manipulated at run-time.

An event-based API (e.g. SAX ) in contrast, reports parsing events (e.g. the start and 
end of elements) to the application through callbacks, and does not usually build an 
internal tree.  So, SAX defines an approach whereby parsers scan through XML data, 
calling handler functions whenever certain events are triggered. (e.g. when parts of 
the document like text nodes or processing instructions are found ). 

At runtime an entire DOM tree will stay in memory, till released, and is available for 
manipulation using functions that can return the parent, child or otherwise defined 
nodes located anywhere in the tree. Then, querying on and manipulation of these nodes 
is possible and this is fairly straightforward in DOM API's.                           
                                                                                  In 
SAX no internal tree/representation of the XML is created so that there is no memory 
consideration. The parser just calls certain handler functions when defined events( 
like the start/end of document or finding of a child node or finding text elements 
etc.) take place. This design makes the SAX implementation faster.

 

first consideration( resources & coding complexity );

The main advantage of SAX is its ability to scan and parse mammoth XMl's  without any 
strain on memory resources. Also it the API of choice when a custom object model( as 
against DOM ) for the XML document is desired. The main disadvantage is a possible 
complexity of coding/development and an approach which is less intuitive and modular 
for most programmers. The lack of a document representation leaves the challenge of 
manipulating, serializing, and traversing the XML document, as required, to the coder.

The main advantage of DOM is that it makes intensive manipulation easy to implement as 
the parser does almost everything; from reading the XML document in to creating a Java 
object model on top of it and then giving a reference to this object model (a Document 
object) so that it can be manipulated. The principal disadvantage is a possibly heavy 
overhead on memory because of loading the entire xml structure in one go.

 

second consideration( nature of XML to be parsed );

If an XML document contains document data (e.g., pdf documents stored in XML format) 
then DOM is a completely natural fit. As a typical example, for a document information 
management system( e.g. the Datachannel RIO product ) DOM is well suited to allow 
programs access to information stored in documents like word/excel/pdf ).

On the other hand, if the information stored in an XML document is structured machine 
readable (and generated) data then SAX is the right API for access to this 
information. Machine readable and generated data include things like Java object 
properties stored in XML format. An example is an address book XML file which is  not 
like a word processor document; rather it is a document that contains pure data, which 
has been encoded into text using XML. When data is of this kind, creation of tailored 
data structures and classes (object models) is needed anyway in order to manage, 
manipulate and/or persist this data.  SAX allows quick creation of a handler class 
which can create instances of these object models based on data stored in XML. SAX 
would also be the choice for say, simply reading text elements sequentially from an 
XML or maybe locating a particular element inside an XML for which loading the entire 
structure to memory( as in DOM ) would be overkill.  

 

comparison table;

 

Consideration
 DOM
 SAX
 
memory  requirement
 high
 very low
 
coding complexity
 very low
 medium - high
 
speed
 slower, in general, than SAX
 high
 
ease of xml manipulation
 very high
 medium - low
 
parsing requirement which lends easily to element-object mapping or simple parsing 
requirement like reading sequentially from an XMl  file
 SAX is the optimum choice though DOM can do the job as well
 highly suited
 
manipulation intensive parsing requirement or where data is much better represented as 
a tree( as against a custom object model ) or when ease of coding is desired and 
resource overhead is not a concern
 highly suited
 DOM is the optimum choice though SAX can do the job as well
 

 

 

___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

Re: Performance of Xerces & Xalan

Reply via email to