wrong handling of ignorableWhitespace/characters in SafeContentHandler and
WriteoutContentHandler
-------------------------------------------------------------------------------------------------
Key: TIKA-190
URL: https://issues.apache.org/jira/browse/TIKA-190
Project: Tika
Issue Type: Bug
Affects Versions: 0.3
Reporter: Uwe Schindler
During investigation of TIKA-189, I found out the following:
The patch TIKA-188 does everything correct (if looking at the output), but the
internal handling is incorrect. XHTMLContentHandler inserts ignorableWhitespace
with the tabs and newlines, but the superclass SafeContentHandler has a bug
that forwards ignorableWhitespace() to the decorators characters() event
(copy'n'paste-error). Fixing this, the tests fail, because
WriteoutContentHandler has no ignorableWhitespace() and removes all whitespace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.