Hi Eric,

We have also helped some government institution to replave their expensive GSA 
with open source software. In our case we use Apache Nutch 1.7 to crawl the 
websites and index to Apache Solr. It is very effective, robust and scales 
easily with Hadoop if you have to. Nutch may not be the easiest tool for the 
job but is very stable, feature rich and has an active community here at Apache.

Cheers,
 
-----Original message-----
> From:Palmer, Eric <epal...@richmond.edu>
> Sent: Wednesday 30th October 2013 18:48
> To: solr-user@lucene.apache.org
> Subject: Replacing Google Mini Search Appliance with Solr?
> 
> Hello all,
> 
> Been lurking on the list for awhile.
> 
> We are at the end of life for replacing two google mini search appliances 
> used to index our public web sites. Google is no longer selling the mini 
> appliances and buying the big appliance is not cost beneficial.
> 
> http://search.richmond.edu/
> 
> We would run a solr replacement in linux (cents, redhat, similar) with open 
> Java or Oracle Java.
> 
> Background
> ==========
> ~130 sites
> only ~12,000 pages (at a depth of 3)
> probably ~40,000 pages if we go to a depth of 4
> 
> We use key matches a lot. In solr terms these are elevated documents 
> (elevations)
> 
> We would code a search query form in php and wrap it into our design 
> (http://www.richmond.edu)
> 
> I have played with and love lucidworks and know that their $ solution works 
> for our use cases but the cost model is not attractive for such a small 
> collection.
> 
> So with solr what are my open source options and what are people's 
> experiences crawling and indexing web sites with solr + crawler. I understand 
> there is not a crawler with solr so that would have to be first up to get one 
> working.
> 
> We can code in Java, PHP, Python etc. if we have to, but we don't want to 
> write a crawler if we can avoid it.
> 
> thanks in advance for and information.
> 
> --
> Eric Palmer
> Web Services
> U of Richmond
> 
> 

Reply via email to