Hi.

I was planning on using nutch and UIMA to analyze to perform entity extraction, and noticed that you mention that Tika would be designed to do this.

i was wondering how things were going with Tika, as it doesn't seem like there is any code/design plans checked in (except for the proposal).

So I would like to spark the discussion.

i would like to:
- use nutch to fetch the pages (HTML) from the site
- UIMA to analyze them and extract interesting information.
- mysql, or possibly HBase to store versioned/historical output of this analysis, for possible further reporting on (stats, and page timelines)

is Tika going to be able to do this for me?

regards
Ian
--
Ian Holsman
[EMAIL PROTECTED]



Reply via email to