[
https://issues.apache.org/jira/browse/TIKA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553912
]
Jukka Zitting commented on TIKA-105:
------------------------------------
Looks good! I committed the class in revision 606141 so it'll be easier for
people to try it out.
Is there major functional difference (i.e. different text or metadata
extracted) between this and our existing ExcelParser class? If not, I think we
should probably make this one the default Excel parser and drop the other one.
Test cases would be very much welcome. :-)
> Excel parser implementation based on POI's Event API
> ----------------------------------------------------
>
> Key: TIKA-105
> URL: https://issues.apache.org/jira/browse/TIKA-105
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Niall Pemberton
> Priority: Minor
> Attachments: ExcelEventParser.java
>
>
> Tika's existing ExcelParser implementation uses POI's HSSFWorkbook to extract
> text from an Excel file. POI also provides an alternative "Event API"[1] for
> processing Excel files - the advantage being that it has a much smaller
> memory footprint, but at the cost of a slightly more complex API.
> I have written an alternative excel parser implementation based on the Event
> API - if its of interest to the Tika project I'll write a test case for it.
> [1] http://poi.apache.org/hssf/how-to.html#event_api
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.