With that many documents, I think GSA cost might be in millions of USD. Don't go there.
300 MB docs might be called medium these days. Of course, if those documents themselves are huge, then it's more resource intensive. 10 TB sounds like a lot when it comes to search, but it's hard to tell what that represents (e.g. are those docs with lots of photos in them? Presentations very light on text? Plain text documents with 300 words per page? etc.) Anyhow, yes, Solr is a fine choice for this. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: atreyu <wjhendrick...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Thu, May 12, 2011 12:59:28 PM > Subject: Support for huge data set? > > Hi, > > I have about 300 million docs (or 10TB data) which is doubling every 3 > years, give or take. The data mostly consists of Oracle records, webpage > files (HTML/XML, etc.) and office doc files. There are b/t two and four > dozen concurrent users, typically. The indexing server has > 27 GB of RAM, > but it still gets extremely taxed, and this will only get worse. > > Would Solr be able to efficiently deal with a load of this size? I am > trying to avoid the heavy cost of GSA, etc... > > Thanks. > > > -- > View this message in context: >http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html > > Sent from the Solr - User mailing list archive at Nabble.com. >