Apologies if this is a stupid question, but I don't understand WriteOutContentHandler[1] - shouldn't it be implementing the startElement(), endElement() etc. methods?
For example, ExcelParserTest[2] outputs the following for testEXCEL.xls: Simple Excel documentSample Excel Worksheet - Numbers and their Squares Number Square 1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0 36.0 7.0 49.0 8.0 64.0 9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0 169.0 14.0 196.0 15.0 225.0 Written and saved in Microsoft Excel X for Mac Service Release 1. ..but I would have thought it should be something like <html> <head> <title>Simple Excel document</title> </head> <body> <p>Sample Excel Worksheet - Numbers and their Squares Number Square 1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0 36.0 7.0 49.0 8.0 64.0 9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0 169.0 14.0 196.0 15.0 225.0 Written and saved in Microsoft Excel X for Mac Service Release 1.</p> </body> </html> Niall [1] http://incubator.apache.org/tika/xref/org/apache/tika/sax/WriteOutContentHandler.html [2] http://incubator.apache.org/tika/xref-test/org/apache/tika/parser/microsoft/ExcelParserTest.html