[ https://issues.apache.org/jira/browse/TIKA-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christophe Lacroix updated TIKA-1733: ------------------------------------- Description: I'm using Tika to extract text for Solr indexing. For some word documents, Tika throws this Exception: {code:xml} Caused by: java.lang.IllegalArgumentException: This paragraph is not the first one in the table at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:931) at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:240) at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:225) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:191) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 30 more {code} That seems to be the same bug as [https://issues.apache.org/jira/browse/TIKA-1251] but I cannot reproduce the bug with the attached doc in this JIRA. Nevertheless, I cannot reproduce with my document and 1.4 Tika version. was: I'm using Tika to extract text for Solr indexation. For some word documents, Tika throws this Exception: {code:xml} Caused by: java.lang.IllegalArgumentException: This paragraph is not the first one in the table at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:931) at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:240) at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:225) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:191) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 30 more {code} That seems to be the same bug as [https://issues.apache.org/jira/browse/TIKA-1251] but I cannot reproduce the bug with the attached doc in this JIRA. Nevertheless, I cannot reproduce with my document and 1.4 Tika version. > RuntimeException when parsing some word (.doc) documents > -------------------------------------------------------- > > Key: TIKA-1733 > URL: https://issues.apache.org/jira/browse/TIKA-1733 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5, 1.6, 1.7, 1.8, 1.9, 1.10 > Environment: Windows and Linux > Reporter: Christophe Lacroix > Attachments: 2012-PRS_OPER-ATT-329-1.DOC > > > I'm using Tika to extract text for Solr indexing. > For some word documents, Tika throws this Exception: > {code:xml} > Caused by: java.lang.IllegalArgumentException: This paragraph is not the > first one in the table > at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:931) > at > org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:240) > at > org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:225) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:191) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 30 more > {code} > That seems to be the same bug as > [https://issues.apache.org/jira/browse/TIKA-1251] but I cannot reproduce the > bug with the attached doc in this JIRA. > Nevertheless, I cannot reproduce with my document and 1.4 Tika version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)