Hi, Just to let you know that we've just donated a UIMA component based on Tika which is used to convert markup into UIMA annotations, extract the text and metadata etc... More details on https://issues.apache.org/jira/browse/UIMA-1095
Best, Julien -- DigitalPebble Ltd http://www.digitalpebble.com