Hi there, in my project I wrote text-extractors for multiple file-formats. For ms-office binary files I am using apache-poi.
For PPT I wrote an own extractor based on DocumentInputStream which works quite ok. For XLS I use HSSFEventFactory which causes my testcase to fail because it only hits events about the first sheet. However the content of the second sheet is not found. For DOC I use WordExtractor which causes my testcase to fail because it does NOT find the content of comments and footnotes. Can someone help me if I do anything wrong or if this is an expected behavior or bug of POI? You will find everything (including test-cases and office files) here: https://m-m-m.svn.sourceforge.net/svnroot/m-m-m/trunk/mmm-search/mmm-search-parser/mmm-search-parser-impl-poi/ Thanks a lot Jörg --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
