On Dec 26, 2007 7:19 PM, Keith R. Bennett <[EMAIL PROTECTED]> wrote: > > Niall - > > When you say it includes the sheet name, you mean the name of each sheet > (tab) in the Excel file, right?
Yes > Does it come out as bare text, or is it > encoded in a way that can be parsed (e.g. "{[Sheet: MySheet1]}")? Or is > this configurable? Just plain text and not configurable ATM. > We have a need to read Excel files with more structure than the usual > unstructured text document. At minimum, it would be great to be able to be > able to know where one sheet ends and the next begins. Is this something > that would be appropriate to support, or does that go beyond the generic > unstructured text parsing mission of Tika? I'm leave that for the Tika devs to comment on. > Also, based on your knowledge of > Poi (I have none), how difficult is that to implement? I may need to do it > myself. Very easy. Tika has two excel parsers now the original one (ExcelParser) uses the easier/simpler POI API and the one I wrote (ExcelEventParser) has a smaller memory footprint, but uses the slightly more complex POI Event API. I believe either of them could be easily adapted to your needs though. Niall > Thanks much, > Keith > > > JIRA [EMAIL PROTECTED] wrote: > > > > > > [ > > https://issues.apache.org/jira/browse/TIKA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554021 > > ] > > > > Niall Pemberton commented on TIKA-105: > > -------------------------------------- > > > > > The only functional difference between this implementation and ExcelParser > > is that it also writes out the sheet name to the stream this could easily > > be added with a one line change to ExcelParser though. > > > > > > -- > View this message in context: > http://www.nabble.com/-jira--Created%3A-%28TIKA-105%29-Excel-parser-implementation-based-on-POI%27s-Event-API-tp13942709p14505443.html > Sent from the Apache Tika - Development mailing list archive at Nabble.com. > >