Hello,
I've been trying to extract text from a couple of different MS-Word
files and I'm getting mixed results.
Almost by random (as I see it) I get this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: -21047
at
java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:886)
at java.lang.StringBuffer.substring(StringBuffer.java:417)
at org.apache.poi.hwpf.model.TextPiece.substring(TextPiece.java:88)
at
org.apache.tika.parser.microsoft.WordParser.extractText(WordParser.java:163)
Looking at the TextPiece in POI I can see that the substring method is
called with a negative value for end
public String substring(int start, int end)
{
int denominator = _usesUnicode ? 2 : 1;
return ((StringBuffer)_buf).substring(start/denominator, end/denominator);
}
I just can't see why / how runEnd - currentTextStart can end up being
a negative value.
String str = currentPiece.substring(0, runEnd - currentTextStart);
Any ideas?
Regards Mats