"Themmen, Joel" <[EMAIL PROTECTED]> wrote:
> I need a cross-OS XML parser and XPath evaluator. We originally used the
> MSXML parser but that obviously was not available on non-Microsoft OS's.
We
> have used Xerces internally for a number of other projects and are quite
> pleased with it. The only lacking has been that we had to write an
internal
> XPath parser. This was not fun since the specification is relatively
large
> and complex. We chose to implement a small section of the specification.
It
> is OK but that is all.
>
> I would really like to use Xalan to do the XPath evaluation (due to
> completeness and (hopefully) robustness). In essence, we create need a
DOM
> document since we are doing many manual modifications of the document. We
> use XPath to find nodes and then we may modify the node(s) - or add
nodes-
> remove nodes - etc. In other words, we really need to use the DOM
> representation of the XML document since we do so much manual work on it.
> Can I do this successfully with Xerces/Xalan? Is this the correct tool
set?
There is an implementation of a wrapper around the Xerces DOM in Xalan. It
does allow you to evaluate XPath expressions using the Xerces DOM. The
biggest problem is that, for large trees, it can be inefficient. Your
processing model adds more inefficiency, because you want to modify the
DOM. That means the wrapper cannot cache any information about it's child,
parent, or sibling nodes, and that it cannot be indexed. Depending on the
kinds of expressions you're going to evaluate, and the size of the
document, that can be very slow, since node-sets must be returned in
document order.
There will be an interim build of Xalan coming in the next week or two,
which will have some fixes to the wrapper layer, and a more efficient
mapping of nodes between the Xerces DOM and Xalan's internal interfaces.
You might want to look for that and give it a test run.
> I cannot use the native Xerces/Xalan data structures since we have
> an internal wrapper that I must comply to. We wrap the Xerces/Xalan data
> structures in structures that are already in use within our code base. Is
> this going to cause problems (it should not but perhaps someone has more
> experience in this than I do) due to destructors that must be called?
I have no idea -- can you give more information about which native data
structures you cannot use and what you're replacing them with?
> Memory leaks - I have been using the Xerces code plus my XPath
> parser and I am experiencing quite a few memory leaks. I believe that the
> vast majority of leaks are of my own making however I am unclear on a few
of
> the basic premises of the Xerces/Xalan documents. In particular parsers
and
> documents:
>
> 1.) I will need to create a approximately 15 documents in any one
> instantiation. Should I be using the same the same parser for
> each a every document and then freeing the parser as I leave the code?
>
> Should I be calling doc->release as soon as I am done with a
> document?
I suspect you should. The memory a document uses is never release until
you release the document itself.
> 2.) What does parser->resetDocumentPool() really do?
> Does it free all documents created by that parser
> (even if doc->release() has not been called?
You should ask this on the Xerces list, but I think it destroys any
documents the parser has created, unless you've specifically adopted then
yourself, through XercesDOMParser::adoptDocument(). You could also take a
look at the source code to confirm this.
> Can I create a Xalan document and modify it as I would a Xerces
> document? Or should I be creating a Xerces document?
No. Since XSLT views the source tree as immutable, Xalan's default
implementation is read-only. This makes it more efficient for many things,
but precludes allowing modification.
Dave