Hi,

Have a look at the TikaAnnotator in the sandbox. It extracts the text and
metadata from various document formats and converts any available markup
into annotations

HTH

Julien


On 29 September 2011 07:28, abhishek <abhishe...@sqlstar.com> wrote:

> Hi,
> While reading the docuemntation of UIMA, i found out that
> UIMA&nbsp;supports&nbsp;html files.
> &nbsp;
> However, when i am running the
> org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to
> understand the text.
> &nbsp;
> Kindly let me know, the correct way to read these type of files.
> &nbsp;




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to