I'm running out of memory when processing largish node sets. I have XSLT code like this

<xsl:for-each select="data">
 <xsl:variable name="split_data" select="ext:split(., '_', 4)"/>
  ...
<xsl:for-each>

ext:split is an XSLT extension which calls the Java String.split method and returns a node set (see below). This code throws and OutOfMemory exception when 'data' has 400 nodes, which doesn't seem very much to me. The exception is thrown at the point where the XSLT transformer is trying to call my extension.

I'm pretty new to XSLT (and Java) so maybe I'm doing something stupid although I'm not sure what. Why does XSLT need so much memory to process a fairly small amount of data and how can I code this more efficiently?

/**
 * Split a string into tokens and return as a node set to XSLT.
 */
public static NodeIterator split(String str, String regex, int limit) {
    NodeIterator nodes = null;
    try {
        String[] tokens = str.split(regex, limit);
        StringBuffer xmlTokens = new StringBuffer();
        xmlTokens.append("<root>");
        for (int i = 0; i < tokens.length; ++i) {
            xmlTokens.append("<tok>");
            xmlTokens.append(escapeXmlChars(tokens[i]));
            xmlTokens.append("</tok>");
        }
        xmlTokens.append("</root>");
        nodes = topLevelNodes(xmlTokens.toString());
    } catch (Exception e) {
        // todo log the error
    }
    return nodes;
}

public static NodeIterator topLevelNodes(String str) throws
SAXException, IOException {
    DOMParser parser = new DOMParser();
    parser.parse(new InputSource(new StringReader(str)));
    DocumentImpl doc = (DocumentImpl) parser.getDocument();
    return doc.createNodeIterator(doc.getDocumentElement(),
            NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT,
            new TopLevelOnly(),
            false);
}

private static class TopLevelOnly implements NodeFilter {
    public short acceptNode(Node node) {
        Element root = node.getOwnerDocument().getDocumentElement();
        return node.getParentNode() == root ?
                FILTER_ACCEPT : FILTER_REJECT;
    }
}

I'm aware I could use the EXSLT extensions to do this and it would probably be a whole lot more efficient than my own efforts but I would like to understand what is wrong with the code above.

Thanks,
John

Reply via email to