I'm a bit biased but i would certainly use Nutch as it's the right tool for 
the job, it seems. Developing custom plugins is actually easier than you might 
think.

Solr, with it's extracting request handling, can only help in a very limited 
way.

> Hello everyone.
> 
> I've been thinking about a way to retrieve information from a domain (for
> example, http://www.ign.com) to process and index. My idea is to use Solr
> as a searcher. I'm familiarized with Apache Nutch and I know that the
> latest version has a gateway to Solr to retrieve and index information
> with it. I tried it and it worked fine, but it's a little bit complex to
> develop plugins to process info and index it in a new field desired.
> Perhaps one of you have tried another (and better) alternative to data
> mine web
> information. Which is your recommendation? Can you give me any scraping
> suggestion?
> 
> Thank you very much.
> 
> Luis Cappa.

Reply via email to