Nutch was never meant for vertical or enterprise search. Solr, is a
great engine but obviously you need to get to the documents first. In
order for me to state any further opinion I should ask the following:

1) What kind of documents/repositories are you trying to provide search for?
2) Are security and user access/permissions important for you?
3) What is the typical size of the document universe you which your
software to handle (in number of documents + avg size and/or total
GB)?

-- J

On Tue, May 10, 2011 at 7:37 AM, webdev1977 <[email protected]> wrote:
> I have been working on an off for about a year now on developing a prototype
> for Enterprise Search using Nutch and Solr.  I have also incorporated a
> plugin using the hive-mrc google code for automatic tagging based on a
> custom taxonomy that my customer uses.  I have been slowly migrating up the
> chain of machines available and I have been given one machine for my
> "prototype" that is fairly powerful.
>
> Some problems still remain that I beleive can be fixed and others make me
> question my decision to use Nutch.
>
> One problem has to do with the fact that I am doing vertical searching.  The
> side effect of this is that the crawl process is SO slow.  It took about 48
> hours to crawl about 350,000 urls all from the same website. I am am
> crawling a shared file system and I am sure that constitutes vertical
> crawling.  The other web crawling I am doing also only comes from a handful
> of urls.  Maybe nutch is not the solution to use based on this?
>
> The other problem is the fact that I would like to use the
> AdaptiveFetchSchedule and the developers I work with refuse to use caching
> and Last Modified time for our PHP pages.  This should be a nightmare :-(
>
> I love the solr aspect of our prototype.  It is very fast and reliable and I
> have not had lots of issues.
>
> In the real world, how to production environments use Nutch?  Do they have a
> separate custom script that runs each of the crawl commands separately?  Do
> they run this script once a day?  What about vertical crawling, are there
> any special setting that could help Nutch run faster?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Going-Beyond-the-Prototype-tp2923289p2923289.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Reply via email to