Niall -
When you say it includes the sheet name, you mean the name of each sheet
(tab) in the Excel file, right? Does it come out as bare text, or is it
encoded in a way that can be parsed (e.g. "{[Sheet: MySheet1]}")? Or is
this configurable?
We have a need to read Excel files with more structure than the usual
unstructured text document. At minimum, it would be great to be able to be
able to know where one sheet ends and the next begins. Is this something
that would be appropriate to support, or does that go beyond the generic
unstructured text parsing mission of Tika? Also, based on your knowledge of
Poi (I have none), how difficult is that to implement? I may need to do it
myself.
Thanks much,
Keith
JIRA [EMAIL PROTECTED] wrote:
>
>
> [
> https://issues.apache.org/jira/browse/TIKA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554021
> ]
>
> Niall Pemberton commented on TIKA-105:
> --------------------------------------
>
> The only functional difference between this implementation and ExcelParser
> is that it also writes out the sheet name to the stream this could easily
> be added with a one line change to ExcelParser though.
>
>
--
View this message in context:
http://www.nabble.com/-jira--Created%3A-%28TIKA-105%29-Excel-parser-implementation-based-on-POI%27s-Event-API-tp13942709p14505443.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.