Hi, Is there a possibility to have instead of the text output in the Tika Extractor (Manifold version, not the extract handler) the (X)HTML output? How one can achieve this in Tika is pretty clear: https://tika.apache.org/1.8/examples.html#Picking_different_output_formats
Reason: We need to extract very specific chapters from a word document and index them as dedicated Solr documents (the latter part is probably still to be done in an update chain). There we currently already extract from the HTML version created by Tika of the word document the (sub-)chapters we need. thank you. best regards
