[
https://issues.apache.org/jira/browse/TIKA-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated TIKA-190:
-------------------------------
Attachment: TIKA-190.patch
Attached is the patch that fixes this.
> wrong handling of ignorableWhitespace/characters in SafeContentHandler and
> WriteoutContentHandler
> -------------------------------------------------------------------------------------------------
>
> Key: TIKA-190
> URL: https://issues.apache.org/jira/browse/TIKA-190
> Project: Tika
> Issue Type: Bug
> Affects Versions: 0.3
> Reporter: Uwe Schindler
> Attachments: TIKA-190.patch
>
>
> During investigation of TIKA-189, I found out the following:
> The patch TIKA-188 does everything correct (if looking at the output), but
> the internal handling is incorrect. XHTMLContentHandler inserts
> ignorableWhitespace with the tabs and newlines, but the superclass
> SafeContentHandler has a bug that forwards ignorableWhitespace() to the
> decorators characters() event (copy'n'paste-error). Fixing this, the tests
> fail, because WriteoutContentHandler has no ignorableWhitespace() and removes
> all whitespace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.