[ https://issues.apache.org/jira/browse/TIKA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554021 ]
Niall Pemberton commented on TIKA-105: -------------------------------------- Great thanks - I have tested this quite a bit using excel sheets from work, but I wanted to see if you were interested before creating test cases for Tika - I'll get do that now though (hopefully) in the next couple of weeks. The only functional difference between this implementation and ExcelParser is that it also writes out the sheet name to the stream this could easily be added with a one line change to ExcelParser though. Sorry about the ExcelUtils - its a work-in-progress, mostly for TIKA-103 (cell formatting & date/number values) - when I get time to finish it I plan to offer it to Tika (hopefully temporarily until most of it gets into a POI release). > Excel parser implementation based on POI's Event API > ---------------------------------------------------- > > Key: TIKA-105 > URL: https://issues.apache.org/jira/browse/TIKA-105 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Niall Pemberton > Priority: Minor > Attachments: ExcelEventParser.java > > > Tika's existing ExcelParser implementation uses POI's HSSFWorkbook to extract > text from an Excel file. POI also provides an alternative "Event API"[1] for > processing Excel files - the advantage being that it has a much smaller > memory footprint, but at the cost of a slightly more complex API. > I have written an alternative excel parser implementation based on the Event > API - if its of interest to the Tika project I'll write a test case for it. > [1] http://poi.apache.org/hssf/how-to.html#event_api -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.