Hi - see inline: -----Original message----- > From:Palmer, Eric <[email protected]> > Sent: Friday 1st November 2013 9:12 > To: [email protected] > Subject: nutch + solr for website indexing > > Hello all, > > I am going to stand up a solr and nutch instance and use it for indexing ~120 > web sites. These sites are in the U of Richmond public web space and contain > less than 100,000 pages if crawled completely. > > I've not done this before. We have been using the google mini appliance and > are decommissioning these soon. > > I'm looking for any advice I can get. > > Is Nutch 2.2.1 compatible with solr 4.5.1? This will be on Amazon linux and > to start I will install both on the same EC2 instance. I may separate Nutch > to a separate instance for performance reasons. The mini indexes these sites > in less than 2 hours so I'm guessing Nutch will do the same on a single > server instance.
No, Nutch 2.x doesn't talk to Solr 4 yet. I would certainly recommend using Nutch 1.7 over 2.x. It is faster, more stable, robust and feauture rich. > > Our needs are pretty simple. We just need to be able to extract title and > body. > > Thanks in advance for your help. > > > Eric Palmer > Web Services > University of Richmond

