[ http://issues.apache.org/jira/browse/XALANJ-2295?page=all ]
Henry Zongaro resolved XALANJ-2295:
-----------------------------------
Fix Version: Latest Development Code
Resolution: Fixed
Applied patch in source repository.
> XSLTC performance problems with xsl:key in Muenchian grouping
> -------------------------------------------------------------
>
> Key: XALANJ-2295
> URL: http://issues.apache.org/jira/browse/XALANJ-2295
> Project: XalanJ2
> Type: Bug
> Components: XSLTC
> Versions: 2.7
> Reporter: Henry Zongaro
> Assignee: Henry Zongaro
> Fix For: Latest Development Code
> Attachments: gendata2295.xml, gendata2295.xsl, j2295.xsl, patch.j2295.txt,
> patch.j2295.v2.txt
>
> Performance scales poorly for Muenchian grouping in XSLTC. For an
> instruction like the following, the user would like only the first node in
> the node set returned by each reference to the key function to be processed
> in the predicate, which tests for node identity.
> <xsl:for-each select="item[generate-id() = generate-id(key('k',
> value)[1])]">
> However, in XSLTC all nodes associated with each key value are copied into a
> temporary IntegerArray by the KeyIndex object used to represent the result of
> the key function, and then that result is always put through a
> DupFilterIterator, which sorts the nodes returned by the key function, even
> though the nodes produced by KeyIndex are always in document order; the
> DupFilterIterator seems to be used just as a mechanism for copying nodes from
> the KeyIndex object that was used to evaluate the result of the key function.
> So every node in the result of the key function is always copied at least
> twice.
> Using a heap mechanism like that used in UnionPathIterator to sort the nodes
> would be more efficient, as it would only process each node once, and only as
> each node was needed. Each node in the heap would be the set of nodes (which
> are already in document order because that's how KeyIndex is built to begin
> with) associated with a single key value for that reference to the key
> function, so pulling a single node would require at most O(log2(num_kv))
> comparisons, where num_kv is the number of key values.
> On top of that, a reference to key that has a positional predicate does not
> take advantage of the NthIterator class to directly retrieve the node. This
> means that in an expression like
> key('k',keyvalues)[1]
> the position of every node produced by the key is tested for equality to the
> value one. Using NthIterator would mean as few nodes as possible would be
> retrieved - in this case, just one.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]