Hello all, I am going to stand up a solr and nutch instance and use it for indexing ~120 web sites. These sites are in the U of Richmond public web space and contain less than 100,000 pages if crawled completely.
I've not done this before. We have been using the google mini appliance and are decommissioning these soon. I'm looking for any advice I can get. Is Nutch 2.2.1 compatible with solr 4.5.1? This will be on Amazon linux and to start I will install both on the same EC2 instance. I may separate Nutch to a separate instance for performance reasons. The mini indexes these sites in less than 2 hours so I'm guessing Nutch will do the same on a single server instance. Our needs are pretty simple. We just need to be able to extract title and body. Thanks in advance for your help. Eric Palmer Web Services University of Richmond

