On 5/29/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
...What I was planning to do was use the nutch tool to fetch the URL data into segments, and then write a custom tool to extract the HTML out of the segment and run it through my code, similar to what the 'crawl' does, but dumping the metrics into a mysql DB. Is this similar to what you guys had in mind with Tika?...
I think so, the "extract the HTML" part would be a standard Tika plugin, and your metrics stuff would be a custom plugin. -Bertrand
