XSLTC performance problems with xsl:key in Muenchian grouping
-------------------------------------------------------------
Key: XALANJ-2295
URL: http://issues.apache.org/jira/browse/XALANJ-2295
Project: XalanJ2
Type: Bug
Components: XSLTC
Versions: 2.7
Reporter: Henry Zongaro
Assigned to: Henry Zongaro
Performance scales poorly for Muenchian grouping in XSLTC. For an instruction
like the following, the user would like only the first node in the node set
returned by each reference to the key function to be processed in the
predicate, which tests for node identity.
<xsl:for-each select="item[generate-id() = generate-id(key('k',
value)[1])]">
However, in XSLTC all nodes associated with each key value are copied into a
temporary IntegerArray by the KeyIndex object used to represent the result of
the key function, and then that result is always put through a
DupFilterIterator, which sorts the nodes returned by the key function, even
though the nodes produced by KeyIndex are always in document order; the
DupFilterIterator seems to be used just as a mechanism for copying nodes from
the KeyIndex object that was used to evaluate the result of the key function.
So every node in the result of the key function is always copied at least twice.
Using a heap mechanism like that used in UnionPathIterator to sort the nodes
would be more efficient, as it would only process each node once, and only as
each node was needed. Each node in the heap would be the set of nodes (which
are already in document order because that's how KeyIndex is built to begin
with) associated with a single key value for that reference to the key
function, so pulling a single node would require at most O(log2(num_kv))
comparisons, where num_kv is the number of key values.
On top of that, a reference to key that has a positional predicate does not
take advantage of the NthIterator class to directly retrieve the node. This
means that in an expression like
key('k',keyvalues)[1]
the position of every node produced by the key is tested for equality to the
value one. Using NthIterator would mean as few nodes as possible would be
retrieved - in this case, just one.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]