Are you storing (in addition to indexing) your data? Perhaps you could turn off storage on data older than 7 days (requires reindexing), thus losing the ability to return snippets but cutting down on your storage space and server count. I've experienced 10x decrease in space requirements and a large boost in speed after cutting extraneous storage from Solr -- the stored data is mixed in with the index data and so it slows down searches. You could also put all 200G onto one Solr instance rather than 10 for >7days data, and accept that those searches will be slower.
Michael On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer <silentsurfe...@yahoo.com>wrote: > Hi, > > Thank you Michael and Chris for the response. > > Today after the mail from Michael, we tested with the dynamic loading of > cores and it worked well. So we need to go with the hybrid approach of > Multicore and Distributed searching. > > As per our testing, we found that a Solr instance with 20 GB of > index(single index or spread across multiple cores) can provide better > performance when compared to having a Solr instance say 40 (or) 50 GB of > index (single index or index spread across cores). > > So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve > instances. > > On day 2 data, 10 more Solr slave servers are required; Cumulative Solr > Slave instances = 200*2/20=20 > ... > .. > .. > On day 30 data, 10 more Solr slave servers are required; Cumulative Solr > Slave instances = 200*30/20=300 > > So with the above approach, we may need ~300 Solr slave instances, which > becomes very unmanageable. > > But we know that most of the queries is for the past 1 week, i.e we > definitely need 70 Solr Slaves containing the last 7 days worth of data up > and running. > > Now for the rest of the 230 Solr instances, do we need to keep it running > for the odd query,that can span across the 30 days of data (30*200 GB=6 TB > data) which can come up only a couple of times a day. > This linear increase of Solr servers with the retention period doesn't > seems to be a very scalable solution. > > So we are looking for something more simpler approach to handle this > scenario. > > Appreciate any further inputs/suggestions. > > Regards, > sS > > --- On Fri, 9/25/09, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > > From: Chris Hostetter <hossman_luc...@fucit.org> > > Subject: Re: Can we point a Solr server to index directory dynamically > at runtime.. > > To: solr-user@lucene.apache.org > > Date: Friday, September 25, 2009, 4:04 AM > > : Using a multicore approach, you > > could send a "create a core named > > : 'core3weeksold' pointing to '/datadirs/3weeksold' " > > command to a live Solr, > > : which would spin it up on the fly. Then you query > > it, and maybe keep it > > : spun up until it's not queried for 60 seconds or > > something, then send a > > : "remove core 'core3weeksold' " command. > > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler > > . > > > > something that seems implicit in the question is what to do > > when the > > request spans all of the data ... this is where (in theory) > > distributed > > searching could help you out. > > > > index each days worth of data into it's own core, that > > makes it really > > easy to expire the old data (just UNLOAD and delete an > > entire core once > > it's more then 30 days old) if your user is only searching > > "current" dta > > then your app can directly query the core containing the > > most current data > > -- but if they want to query the last week, or last two > > weeks worth of > > data, you do a distributed request for all of the shards > > needed to search > > the appropriate amount of data. > > > > Between the ALIAS and SWAP commands it on the CoreAdmin > > screen it should > > be pretty easy have cores with names like > > "today","1dayold","2dayold" so > > that your app can configure simple shard params for all the > > perumations > > you'll need to query. > > > > > > -Hoss > > > > > > > > > >