[OT????] java.lang.IllegalStateException: NoWriterSupplied: No writer supplied for serializer.

Grant Ingersoll Sat, 15 Nov 2008 10:59:29 -0800

This may just show my lack of understanding of XPath, etc., but when Iapply [1] to TestParsers, I get the following output:

----------------------------
Val: <?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml";>
    <head>
        <title/>
    </head>
    <body>
        <p>



    Solr Version Control System





      Overview

The Solr source code resides in the Apache Subversion (SVN)repository.The command-line SVN client can be obtained here or as anoptional package for cygwin.The TortoiseSVN GUI client for Windows can be obtained here.Thereare also SVN plugins available for older versions of EclipseandIntelliJ IDEA that don't have subversion support alreadyincluded.



    Here is some more text.  It contains a link.
    Text Here


</p>
    </body>
</html>

java.lang.IllegalStateException: NoWriterSupplied: No writer suppliedfor serializer.

        at org.apache.xml.serialize.XMLSerializer.startElement(Unknown Source)

atorg.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:75)atorg.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:62)atorg.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:75)atorg.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:111)atorg.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:115)atorg.apache.tika.sax.XHTMLContentHandler.lazyStartDocument(XHTMLContentHandler.java:77)atorg.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:110)atorg.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:115)

        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:51)
        at org.apache.tika.TestParsers.testXML(TestParsers.java:96)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)atcom.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:40)

-----------------------

Anyone have any insight as to why wrapping a XMLSerializer inside of aMatchingContentHandler is causing such a problem?


Thanks,
Grant



[1]
Index: src/test/java/org/apache/tika/TestParsers.java
===================================================================

--- src/test/java/org/apache/tika/TestParsers.java (revision713397)

+++ src/test/java/org/apache/tika/TestParsers.java      (working copy)
@@ -19,16 +19,28 @@
 import java.io.File;
 import java.io.FileInputStream;
 import java.io.InputStream;
+import java.io.StringBufferInputStream;
+import java.io.StringWriter;
+import java.io.ByteArrayInputStream;
 import java.util.List;
+import java.nio.charset.Charset;

 import junit.framework.TestCase;

 import org.apache.tika.config.TikaConfig;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.parser.Parser;
+import org.apache.tika.parser.xml.XMLParser;
 import org.apache.tika.utils.ParseUtils;
 import org.apache.tika.utils.Utils;
+import org.apache.tika.sax.xpath.Matcher;
+import org.apache.tika.sax.xpath.MatchingContentHandler;
+import org.apache.tika.sax.xpath.XPathParser;
+import org.apache.tika.sax.XHTMLContentHandler;
+import org.apache.xml.serialize.XMLSerializer;
+import org.apache.xml.serialize.OutputFormat;
 import org.xml.sax.helpers.DefaultHandler;
+import org.xml.sax.ContentHandler;

 /**
  * Junit test class for Tika [EMAIL PROTECTED] Parser}s.
@@ -62,7 +74,59 @@
         tc = TikaConfig.getDefaultConfig();
     }

-    public void testPDFExtraction() throws Exception {
+  private static final XPathParser PARSER =
+          new XPathParser("xhtml", XHTMLContentHandler.XHTML);
+
+  public void testXML() throws Exception {
+    XMLParser parser = new XMLParser();
+    StringWriter writer = new StringWriter();
+    Metadata metadata = new Metadata();

+ ContentHandler contentHandler = new XMLSerializer(writer, newOutputFormat("XML", "UTF-8", true));+ parser.parse(newByteArrayInputStream(sampleXML.getBytes(Charset.forName("UTF-8"))),contentHandler, metadata);

+    writer.close();
+    System.out.println("Val: " + writer.toString());
+
+
+
+    metadata = new Metadata();
+    writer = new StringWriter();
+    Matcher matcher = PARSER.parse("/xhtml:html/descendant:node()");

+ contentHandler = new XMLSerializer(writer, newOutputFormat("XML", "UTF-8", true));+ MatchingContentHandler parsingHandler = newMatchingContentHandler(contentHandler, matcher);+ parser.parse(newByteArrayInputStream(sampleXML.getBytes(Charset.forName("UTF-8"))),parsingHandler, metadata);+ //parser.parse(new StringBufferInputStream(sampleXML),parsingHandler, metadata);

+    writer.close();
+    System.out.println("Val: " + writer.toString());
+  }
+
+
+
+  private static String sampleXML = "<document>\n" +
+          "  \n" +
+          "  <header>\n" +
+          "    <title>Solr Version Control System</title>\n" +
+          "  </header>\n" +
+          "  \n" +
+          "  <body>\n" +
+          "  \n" +
+          "    <section>\n" +
+          "      <title>Overview</title>\n" +
+          "      <p>\n" +

+ " The Solr source code resides in the Apache <a href=\"http://subversion.tigris.org/\";>Subversion (SVN)</a> repository.\n" ++ " The command-line SVN client can be obtained <ahref=\"http://subversion.tigris.org/project_packages.html\";>here</a>or as an optional package for <a href=\"http://www.cygwin.com/\">cygwin</a>.\n" ++ " The TortoiseSVN GUI client for Windows can beobtained <a href=\"http://tortoisesvn.tigris.org/\";>here</a>. There\n" ++ " are also SVN plugins available for older versionsof <a href=\"http://subclipse.tigris.org/\";>Eclipse</a> and \n" ++ " <a href=\"http://svnup.tigris.org/\";>IntelliJIDEA</a> that don't have subversion support already included.\n" +

+          "      </p>\n" +
+          "    </section>\n" +

+ " <p>Here is some more text. It contains <a href=\"http://lucene.apache.org\">a link</a>. </p>\n" +

+          "    <p>Text Here</p>\n" +
+          "  </body>\n" +
+          "  \n" +
+          "</document>";
+
+
+  public void testPDFExtraction() throws Exception {
         File file = getTestFile("testPDF.pdf");
         String s1 = ParseUtils.getStringContent(file, tc);

String s2 = ParseUtils.getStringContent(file, tc,"application/pdf");

[OT????] java.lang.IllegalStateException: NoWriterSupplied: No writer supplied for serializer.

Reply via email to