Hi - see inline:
 
-----Original message-----
> From:Palmer, Eric <[email protected]>
> Sent: Friday 1st November 2013 9:12
> To: [email protected]
> Subject: nutch + solr for website indexing
> 
> Hello all,
> 
> I am going to stand up a solr and nutch instance and use it for indexing ~120 
> web sites. These sites are in the U of Richmond public web space and contain 
> less than 100,000 pages if crawled completely.  
> 
> I've not done this before. We have been using the google mini appliance and 
> are decommissioning these soon. 
> 
> I'm looking for any advice I can get. 
> 
> Is Nutch 2.2.1 compatible with solr 4.5.1?  This will be on Amazon linux and 
> to start I will install both on the same EC2 instance.  I may separate Nutch 
> to a separate instance for performance reasons. The mini indexes these sites 
> in less than 2 hours so I'm guessing Nutch will do the same on a single 
> server instance.

No, Nutch 2.x doesn't talk to Solr 4 yet. I would certainly recommend using 
Nutch 1.7 over 2.x. It is faster, more stable, robust and feauture rich.
> 
> Our needs are pretty simple. We just need to be able to extract title and 
> body.
> 
> Thanks in advance for your help.
> 
> 
> Eric Palmer
> Web Services
> University of Richmond

Reply via email to