Make HtmlParser customizable through ParseContext
-------------------------------------------------
Key: TIKA-347
URL: https://issues.apache.org/jira/browse/TIKA-347
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Fix For: 0.6
In TIKA-304 we added the mapSafeElement() and isDiscardElement() methods to
HtmlParser so that subclasses could better customize how incoming HTML elements
get mapped to the XHMTL output from Tika. This works fairly well but requires
you to modify the Tika configuration file or to explicitly inject a custom
HtmlParser subclass instance to the CompositeParser instance you're using
(AutoDetectParser, etc.).
Now that we have the ParseContext mechanism available to simplify such
customization, it would be nice to allow you to provide a custom "HTML mapper"
instance through the parse context and have HtmlParser call that mapper (if
available) for the mapSafeElement() and isDiscardElement() operations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.