The traversers are used any time an XPath takes a step from one node to
the next. (Mostly in select patterns; match patterns are handled a bit
differently.)
The first() and next() methods of traversers come in two flavors. One is
used when we don't care what kind of node it will be -- eg, when the XPath
step is node(), or when we're skipping nodes with //. Or when we're
processing all children of the current node.
The other flavor of first()/next() is used for any XPath step which cares
about the node type and/or node name. If an index is available, this will
try to take advantage of it to quickly jump to the next matching node;
otherwise it walks all nodes in that axis, testing every one until it
either finds one that matches or runs off the end of the axis.
The traversers are used by the iterators, and the iterators are the core
of how we've implemented XPath -- so this code can get pounded on pretty
heavily.
If your stylesheet has relatively few, simple, general, and local selects,
the indexes may not be buying you much. If you're doing a lot of longer
jumps around your document, the indexes may improve performance
considerably. Consider the case of a document such as
<doc>
<chapter>containing many sections, paragraphs, images,
etc.</chapter>
... many of these chapters ...
</doc>
Now let's say you want to build a table of illustrations, an index, etc.
These involve finding a bunch of like-named elements which are scattered
pretty widely through the document. If you don't have indexing, you have
to search the document to find them. If you do have indexing, finding the
next <img/> element (for example) goes a lot faster. Whether the cost is
worth the gain depends on how often you do this sort of search and how far
you have to search (on average) before finding what you're looking for.
Letting users turn it off, so they can try it both ways and see what works
best might not be a bad idea. But the trade-off may change as your
documents change, or as your stylesheet is refined... so if you do this,
you're going to have to commit to understanding the tradeoffs and/or
retesting periodically to see if your assumptions are still valid...
... On the other hand, it would also be good to periodically check whether
_our_ assumptions are still valid.
...On the other other hand, testing a flag can consume some cycles itself.
______________________________________
Joe Kesselman / IBM Research