[
https://issues.apache.org/jira/browse/TIKA-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mats Norén updated TIKA-109:
----------------------------
Attachment: fil6.doc
Attached file fails with:
java.lang.StringIndexOutOfBoundsException: String index out of range: -4095
at
java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:886)
at java.lang.StringBuffer.substring(StringBuffer.java:417)
at org.apache.poi.hwpf.model.TextPiece.substring(TextPiece.java:88)
at
org.apache.tika.parser.microsoft.WordParser.extractText(WordParser.java:163)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:61)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:173)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:233)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:251)
at org.apache.tika.TestParsers.testWORDxtraction(TestParsers.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
> WordParser fails on some Word files
> -----------------------------------
>
> Key: TIKA-109
> URL: https://issues.apache.org/jira/browse/TIKA-109
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.1-incubator
> Environment: Windows XP
> Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
> Reporter: Mats Norén
> Attachments: fil6.doc
>
>
> WordParser fail on some word files. A negative value is sent to
> TextPiece.substring in POI for some corner case in the algorithm.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.