>o) What is your internal DOM - stree or dtm or both?
DTM has replaced STree in the most recently checked-in code.
> o) Where can in find some detailed documentation on it?
I think the javadoc and inline comments and existing code are most of what
we've got at the moment. We should probably write a principles-of-design
sketch.
Here's an _extremely_ brief overview, to give you some sense of how this is
intended to work:
The DTM Manager keeps track of the DTMs currently active, assigning each a
document ID number. Nodes are referred to by a "node handle" integer made
up of subfields specifying the particular DTM tree (document ID) and the
node's document-order location within that tree. To access the node, we ask
the DTMManager which DTM it's part of, then ask that DTM to retrieve the
desired data.
Node names are similarly referred to by integer index into tables, or more
often by a single "extended node type" integer that combines node type,
namespace (if applicable) and local name (if applicable). Stylesheets will
try to avoid comparing names as strings; when they start up they'll build a
mapping from extended types referenced in the stylesheet to extended types
used in this particular document and thereafter use the table to do those
comparisons. I believe we adapted extended types from XSLTC.
SAX2DTM's text content is stored in a packed char array, which the
underlying representations of nodes know how to index into. Text can be
retrieved as a String (requires creating an Object and copying data at that
time), or an XString (defers copying data), or in some cases can be pumped
directly out to a SAX handler when that's most efficient. This
FastStringBuffer approach was carried over from STree.
DOM2DTM, of course, leaves most of the document's content in its source
DOM, retrieving it as needed... though it does build DTM-level navigational
references and extended type information, for faster access.
DOM2DTM's data structures are built incrementally as needed. SAX2DTM has
provision for interacting with a CoroutineSAXParser, which presents a
"throttled" SAX stream, to achieve similar incrementality.
CoroutineSAXParser has mode in which it can act as a SAX filter, so you can
get incremental behavior even when using existing SAX sources. This
incremental SAX mode normally requires some threading, but there's also a
special subclass which is aware of Xerces' custom incremental-parse
features and permits incremental construction in a single thread.
It's possible to obtain a DOM node from a DTM node. In the case of DOM2DTM,
that will return the first (or only) source DOM node corresponding to this
DTM node; SAX2DTM will return a "proxy" object which provides DOM acces to
the DTM's data. Note that DTM is intended to be a read-only model;
attempting to modify these DOM nodes is Not Supported.
One important point: DTM is not just a different set of bindings for the
DOM. It presents an XPath view of the data rather than a DOM view, and some
details that the DOM retains but which are irrelevant to XPath are
suppressed in DTM. Entity Reference nodes will not appear. Contiguous text
will coalesce into a single node. Some filtering of the document may be
performed based on information provided when the DTM is built; in
particular, xsl:strip-space may be processed while building the DTM.
> o) What is the interface between Xalan in the internal tree?
DOM2DTM and SAX2DTM are reference implementations of the DTM model, and
know how to build data structures from the appropriate sources; The
DTM.java interface (which those implement) and DTMManager.java (which
instantiates and manages DTM instances) are the APIs which the rest of
Xalan uses to read data from the model.
> o) Is the tree made up of arrays
Yes, and/or vectors. (DOM2DTM will also reference the DOM objects it's
wrapped around for some of its information.) But since this is encapsulated
under an API, you should never access those arrays directly unless
implementing a new flavor of DTM or debugging one of our implementations.
> o) Is the database connectivity stuff a part of Xalan or
> does it belong with a different project (Cocoon, etc.) ?
Database access is currently implemented as an Extension Function. Since
that's being invoked from the XSLT stylesheet, and since it's a good
illustration of how to write extensions, I'd say we should keep it in
Xalan.