DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8612>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8612

Performance Enhancement to Xalan distinct function

           Summary: Performance Enhancement to Xalan distinct function
           Product: XalanJ2
           Version: 2.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: org.apache.xalan.lib
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


In Extensions.java, the distinct function uses the Hashtable object to track 
unique nodes.  The Hashtable object synchronizes all access to instances of 
itself.  In Xalan 2.3.1, the current code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    Hashtable stringTable = new Hashtable();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (!stringTable.containsKey(key))
      {
        stringTable.put(key, currNode);
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    return dist;
  }

Since the Hashtable instance is used locally within the method, there really is 
not need to use an object that synchronizes access to its instance.  To improve 
performance, a HashSet should be used.  Furthermore, it is a good idea to 
manually clear the HashSet at the end of the method to ensure the HashSet 
instance is garbage collected.  The enhanced code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (stringSet.add(key))
      {
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    stringSet.clear();

    return dist;
  }


If you want to "completely" ensure the HashSet is garbage collected (due a 
TransformerException being thrown), the following enhanced code could be used 
instead of the above enhanced code:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    try
    {
      Node currNode = ni.nextNode();

      while (currNode != null)
      {
        String key = myContext.toString(currNode);

        if (stringSet.add(key))
        {
          dist.addElement(currNode);
        }
        currNode = ni.nextNode();
      }
    }
    finally
    {
      stringSet.clear();
    }

    return dist;
  }

Reply via email to