DOM v/s SAX
There are two major types of XML APIs namely tree-based APIs and event-based APIs.
A tree-based API (e.g. DOM )translates an XML document into an internal tree structure
and then allows an application to navigate that tree structure. So, DOM gives access
to the information stored in an XML as a hierarchical object model. That is to say, a
DOM parser will process XML data and create an object-oriented hierarchical
representation of it that can be traversed and/or manipulated at run-time.
An event-based API (e.g. SAX ) in contrast, reports parsing events (e.g. the start and
end of elements) to the application through callbacks, and does not usually build an
internal tree. So, SAX defines an approach whereby parsers scan through XML data,
calling handler functions whenever certain events are triggered. (e.g. when parts of
the document like text nodes or processing instructions are found ).
At runtime an entire DOM tree will stay in memory, till released, and is available for
manipulation using functions that can return the parent, child or otherwise defined
nodes located anywhere in the tree. Then, querying on and manipulation of these nodes
is possible and this is fairly straightforward in DOM API's.
In
SAX no internal tree/representation of the XML is created so that there is no memory
consideration. The parser just calls certain handler functions when defined events(
like the start/end of document or finding of a child node or finding text elements
etc.) take place. This design makes the SAX implementation faster.
first consideration( resources & coding complexity );
The main advantage of SAX is its ability to scan and parse mammoth XMl's without any
strain on memory resources. Also it the API of choice when a custom object model( as
against DOM ) for the XML document is desired. The main disadvantage is a possible
complexity of coding/development and an approach which is less intuitive and modular
for most programmers. The lack of a document representation leaves the challenge of
manipulating, serializing, and traversing the XML document, as required, to the coder.
The main advantage of DOM is that it makes intensive manipulation easy to implement as
the parser does almost everything; from reading the XML document in to creating a Java
object model on top of it and then giving a reference to this object model (a Document
object) so that it can be manipulated. The principal disadvantage is a possibly heavy
overhead on memory because of loading the entire xml structure in one go.
second consideration( nature of XML to be parsed );
If an XML document contains document data (e.g., pdf documents stored in XML format)
then DOM is a completely natural fit. As a typical example, for a document information
management system( e.g. the Datachannel RIO product ) DOM is well suited to allow
programs access to information stored in documents like word/excel/pdf ).
On the other hand, if the information stored in an XML document is structured machine
readable (and generated) data then SAX is the right API for access to this
information. Machine readable and generated data include things like Java object
properties stored in XML format. An example is an address book XML file which is not
like a word processor document; rather it is a document that contains pure data, which
has been encoded into text using XML. When data is of this kind, creation of tailored
data structures and classes (object models) is needed anyway in order to manage,
manipulate and/or persist this data. SAX allows quick creation of a handler class
which can create instances of these object models based on data stored in XML. SAX
would also be the choice for say, simply reading text elements sequentially from an
XML or maybe locating a particular element inside an XML for which loading the entire
structure to memory( as in DOM ) would be overkill.
comparison table;
Consideration
DOM
SAX
memory requirement
high
very low
coding complexity
very low
medium - high
speed
slower, in general, than SAX
high
ease of xml manipulation
very high
medium - low
parsing requirement which lends easily to element-object mapping or simple parsing
requirement like reading sequentially from an XMl file
SAX is the optimum choice though DOM can do the job as well
highly suited
manipulation intensive parsing requirement or where data is much better represented as
a tree( as against a custom object model ) or when ease of coding is desired and
resource overhead is not a concern
highly suited
DOM is the optimum choice though SAX can do the job as well
___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".
Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html