[ 
https://issues.apache.org/jira/browse/TIKA-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-578.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9
         Assignee: Jukka Zitting

Good point, thanks! Fixed in revision 1060818.

> XMLParser ContentHandler: multiple endDocument calls
> ----------------------------------------------------
>
>                 Key: TIKA-578
>                 URL: https://issues.apache.org/jira/browse/TIKA-578
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8
>         Environment: N/A
>            Reporter: Scott Severtson
>            Assignee: Jukka Zitting
>             Fix For: 0.9
>
>
> When supplying a ContentHandler to a XMLParser instance, the ContentHandler's 
> .endDocument() method is called twice; once by the SAXParser created within 
> XMLParser, once explicitly by XMLParser itself. 
> Sample code:
> ---
> InputStream inputStream = ...
> XMLParser parser = new DcXMLParser();
> ParseContext context = new ParseContext();
> Metadata metadata = new Metadata();
> DOMResult result = new DOMResult();
> TransformerHandler transformerHandler = ((SAXTransformerFactory) 
> SAXTransformerFactory.newInstance()).newTransformerHandler();
> transformerHandler.setResult(result);
> parser.parse(inputStream, transformerHandler, metadata, context);
> ---
> The following exception is produced:
> ---
> java.util.EmptyStackException
>       at java.util.Stack.peek(Stack.java:85)
>       at java.util.Stack.pop(Stack.java:67)
>       at 
> com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
>       at 
> com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
>       at 
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
>       at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
>       at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
>       ...
> ---
> We have worked around the issue temporarily by passing in a ContentHandler 
> that eats the first .endDocument() call, and allows the second to go through. 
> However, we believe XMLParser should hide the extraneous .endDocument() call 
> internally.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to