RE: Why do indexing?

Joseph Kesselman Tue, 17 Sep 2002 10:20:37 -0700

The traversers are used any time an XPath takes a step from one node to 
the next. (Mostly in select patterns; match patterns are handled a bit 
differently.)


The first() and next() methods of traversers come in two flavors. One is 
used when we don't care what kind of node it will be -- eg, when the XPath 
step is node(), or when we're skipping nodes with //. Or when we're 
processing all children of the current node.

The other flavor of first()/next() is used for any XPath step which cares 
about the node type and/or node name. If an index is available, this will 
try to take advantage of it to quickly jump to the next matching node; 
otherwise it walks all nodes in that axis, testing every one until it 
either finds one that matches or runs off the end of the axis.


The traversers are used by the iterators, and the iterators are the core 
of how we've implemented XPath -- so this code can get pounded on pretty 
heavily. 

If your stylesheet has relatively few, simple, general, and local selects, 
the indexes may not be buying you much. If you're doing a lot of longer 
jumps around your document, the indexes may improve performance 
considerably. Consider the case of a document such as
        <doc>
                <chapter>containing many sections, paragraphs, images, 
etc.</chapter> 
                ... many of these chapters ...
        </doc>

Now let's say you want to build a table of illustrations, an index, etc. 
These involve finding a bunch of like-named elements which are scattered 
pretty widely through the document. If you don't have indexing, you have 
to search the document to find them. If you do have indexing, finding the 
next <img/> element (for example) goes a lot faster. Whether the cost is 
worth the gain depends on how often you do this sort of search and how far 
you have to search (on average) before finding what you're looking for.

Letting users turn it off, so they can try it both ways and see what works 
best might not be a bad idea. But the trade-off may change as your 
documents change, or as your stylesheet is refined... so if you do this, 
you're going to have to commit to understanding the tradeoffs and/or 
retesting periodically to see if your assumptions are still valid...

... On the other hand, it would also be good to periodically check whether 
_our_ assumptions are still valid.

...On the other other hand, testing a flag can consume some cycles itself. 



______________________________________
Joe Kesselman  / IBM Research

RE: Why do indexing?

Reply via email to