Well I assume many people out there would have indexes larger than 100GB and I don't think so normally you will have more RAM than 32GB or 64!
As I mentioned the queries are mostly phrase, proximity, wildcard and combination of these. What exactly do you mean by distribution of documents? On this index our documents are not more than few hundred KB's on average (file system size) and there are around 14 million documents. 80% of the index size is taken up by position file. I am not sure if this is what you asked? On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com > wrote: > Hi, > > > > Sharding is an option too but that too comes with limitations so want to > > keep that as a last resort but I think there must be other things coz > 150GB > > is not too big for one drive/server with 32GB Ram. > > Hmm.... what makes you think 32 GB is enough for your 150 GB index? > It depends on queries and distribution of matching documents, for example. > What's yours like? > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Salman Akram <salman.ak...@northbaysolutions.net> > > To: solr-user@lucene.apache.org > > Sent: Tue, January 25, 2011 4:20:34 AM > > Subject: Performance optimization of Proximity/Wildcard searches > > > > Hi, > > > > I am facing performance issues in three types of queries (and their > > combination). Some of the queries take more than 2-3 mins. Index size is > > around 150GB. > > > > > > - Wildcard > > - Proximity > > - Phrases (with common words) > > > > I know CommonGrams and Stop words are a good way to resolve such issues > but > > they don't fulfill our functional requirements (Common Grams seem to > have > > issues with phrase proximity, stop words have issues with exact match > etc). > > > > Sharding is an option too but that too comes with limitations so want to > > keep that as a last resort but I think there must be other things coz > 150GB > > is not too big for one drive/server with 32GB Ram. > > > > Cache warming is a good option too but the index get updated every hour > so > > not sure how much would that help. > > > > What are the other main tips that can help in performance optimization > of > > the above queries? > > > > Thanks > > > > -- > > Regards, > > > > Salman Akram > > > -- Regards, Salman Akram