RTF parser smashes words together in subsequent table cells
-----------------------------------------------------------
Key: TIKA-392
URL: https://issues.apache.org/jira/browse/TIKA-392
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Jukka Zitting
Priority: Minor
I have an RTF document with the following snippet of content (it's an export of
a private phone book so I can't share the full document):
{\rtlch\fcs1 \af0\afs24 \ltrch\fcs0
\f0\fs24\lang2055\langfe2055\langfenp2055\insrsid9461491\charrsid9461491 Fax /
Phone Station\cell Fax / Phone #\cell }
The extracted text is:
Fax / Phone StationFax / Phone
Note how the cell boundary between "Station" and "Fax" is lost.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.