Nutch was never meant for vertical or enterprise search. Solr, is a great engine but obviously you need to get to the documents first. In order for me to state any further opinion I should ask the following:
1) What kind of documents/repositories are you trying to provide search for? 2) Are security and user access/permissions important for you? 3) What is the typical size of the document universe you which your software to handle (in number of documents + avg size and/or total GB)? -- J On Tue, May 10, 2011 at 7:37 AM, webdev1977 <[email protected]> wrote: > I have been working on an off for about a year now on developing a prototype > for Enterprise Search using Nutch and Solr. I have also incorporated a > plugin using the hive-mrc google code for automatic tagging based on a > custom taxonomy that my customer uses. I have been slowly migrating up the > chain of machines available and I have been given one machine for my > "prototype" that is fairly powerful. > > Some problems still remain that I beleive can be fixed and others make me > question my decision to use Nutch. > > One problem has to do with the fact that I am doing vertical searching. The > side effect of this is that the crawl process is SO slow. It took about 48 > hours to crawl about 350,000 urls all from the same website. I am am > crawling a shared file system and I am sure that constitutes vertical > crawling. The other web crawling I am doing also only comes from a handful > of urls. Maybe nutch is not the solution to use based on this? > > The other problem is the fact that I would like to use the > AdaptiveFetchSchedule and the developers I work with refuse to use caching > and Last Modified time for our PHP pages. This should be a nightmare :-( > > I love the solr aspect of our prototype. It is very fast and reliable and I > have not had lots of issues. > > In the real world, how to production environments use Nutch? Do they have a > separate custom script that runs each of the crawl commands separately? Do > they run this script once a day? What about vertical crawling, are there > any special setting that could help Nutch run faster? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Going-Beyond-the-Prototype-tp2923289p2923289.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

