Apologies if this is a stupid question, but I don't understand
WriteOutContentHandler[1] - shouldn't it be implementing the
startElement(), endElement() etc. methods?

For example, ExcelParserTest[2] outputs the following for testEXCEL.xls:

Simple Excel documentSample Excel Worksheet - Numbers and their
Squares Number Square 1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0
36.0 7.0 49.0 8.0 64.0 9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0
169.0 14.0 196.0 15.0 225.0 Written and saved in Microsoft Excel X for
Mac Service Release 1.

..but I would have thought it should be something like

<html>
<head>
<title>Simple Excel document</title>
</head>
<body>
<p>Sample Excel Worksheet - Numbers and their Squares Number Square
1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0 36.0 7.0 49.0 8.0 64.0
9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0 169.0 14.0 196.0 15.0
225.0 Written and saved in Microsoft Excel X for Mac Service Release
1.</p>
</body>
</html>

Niall

[1] 
http://incubator.apache.org/tika/xref/org/apache/tika/sax/WriteOutContentHandler.html
[2] 
http://incubator.apache.org/tika/xref-test/org/apache/tika/parser/microsoft/ExcelParserTest.html

Reply via email to