Hi,
See TIKA-347 for a nice alternative to the earlier TIKA-304 approach
to customizing the way Tika maps incoming HTML to XHTML.
You can now inject a custom mapping strategy through the parse
context, like this:
Parser parser = ...;
ParseContext context = new ParseContext();
context.set(HtmlMapper.class, new MyCustomHtmlMapper())
parser.parse(..., context);
The new HtmlMapper interface contains the same mapSafeElement() and
isDiscardElement() method signatures that we already used for the
overridable HtmlParser methods in TIKA-304. If a custom HtmlMapper
instance is not found in the parse context, then the existing TIKA-304
mechanism is used.
BR,
Jukka Zitting