Grant thanks for the response.

A couple of other people have recommended trying the Nutch + Solr approach, but I am not sure what the real benefit of doing that is. Since Nutch provides most of the same features as Solr and Solr has some nice additional features (like spell checking, incremental index).

So I currently have a Nutch Index of around 500,000+ Urls, but expect it to get much bigger. And am generally pretty happy with it, but I just want to make sure that I am going down the correct path, for the best feature set. As far as implementation to the front end is concerned, I have been using the Nutch search app as basically a webservice to feed the main app (So using RSS). The main app takes that and manipulates the results for display.

As far as the Hadoop + Lucene integration, I haven't used that directly just the Hadoop integration with Nutch. And of course Hadoop independently.

-John


On Oct 22, 2008, at 10:08 AM, Grant Ingersoll wrote:


On Oct 22, 2008, at 7:57 AM, John Martyniak wrote:

I am very new to Solr, but I have played with Nutch and Lucene.

Has anybody used Solr for a whole web indexing application?

Which Spider did you use?

How does it compare to Nutch?

There is a patch that combines Nutch + Solr. Nutch is used for crawling, Solr for searching. Can't say I've used it for whole web searching, but I believe some are trying it.

At the end of the day, I'm sure Solr could do it, but it will take some work to setup the architecture (distributed, replicated) and deal properly with fault tolerance and fail over. There are also some examples on Hadoop about Hadoop + Lucene integration.

How big are you talking?



Thanks in advance for all of the info.

-John


--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










Reply via email to