Hi Michael, We are storing all our data in addition to index, as we need to display those values to the user. So unfortunately we cannot go with the option stored=false, which could have potentially solved our issue.
Appreciate any other pointers/suggestions Thanks, sS --- On Fri, 9/25/09, Michael <solrco...@gmail.com> wrote: > From: Michael <solrco...@gmail.com> > Subject: Re: Can we point a Solr server to index directory dynamically at > runtime.. > To: solr-user@lucene.apache.org > Date: Friday, September 25, 2009, 2:00 PM > Are you storing (in addition to > indexing) your data? Perhaps you could turn > off storage on data older than 7 days (requires > reindexing), thus losing the > ability to return snippets but cutting down on your storage > space and server > count. I've experienced 10x decrease in space > requirements and a large > boost in speed after cutting extraneous storage from Solr > -- the stored data > is mixed in with the index data and so it slows down > searches. > You could also put all 200G onto one Solr instance rather > than 10 for >7days > data, and accept that those searches will be slower. > > Michael > > On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer > <silentsurfe...@yahoo.com>wrote: > > > Hi, > > > > Thank you Michael and Chris for the response. > > > > Today after the mail from Michael, we tested with the > dynamic loading of > > cores and it worked well. So we need to go with the > hybrid approach of > > Multicore and Distributed searching. > > > > As per our testing, we found that a Solr instance with > 20 GB of > > index(single index or spread across multiple cores) > can provide better > > performance when compared to having a Solr instance > say 40 (or) 50 GB of > > index (single index or index spread across cores). > > > > So the 200 GB of index on day 1 will be spread across > 200/20=10 Solr salve > > instances. > > > > On day 2 data, 10 more Solr slave servers are > required; Cumulative Solr > > Slave instances = 200*2/20=20 > > ... > > .. > > .. > > On day 30 data, 10 more Solr slave servers are > required; Cumulative Solr > > Slave instances = 200*30/20=300 > > > > So with the above approach, we may need ~300 Solr > slave instances, which > > becomes very unmanageable. > > > > But we know that most of the queries is for the past 1 > week, i.e we > > definitely need 70 Solr Slaves containing the last 7 > days worth of data up > > and running. > > > > Now for the rest of the 230 Solr instances, do we need > to keep it running > > for the odd query,that can span across the 30 days of > data (30*200 GB=6 TB > > data) which can come up only a couple of times a day. > > This linear increase of Solr servers with the > retention period doesn't > > seems to be a very scalable solution. > > > > So we are looking for something more simpler approach > to handle this > > scenario. > > > > Appreciate any further inputs/suggestions. > > > > Regards, > > sS > > > > --- On Fri, 9/25/09, Chris Hostetter <hossman_luc...@fucit.org> > wrote: > > > > > From: Chris Hostetter <hossman_luc...@fucit.org> > > > Subject: Re: Can we point a Solr server to index > directory dynamically > > at runtime.. > > > To: solr-user@lucene.apache.org > > > Date: Friday, September 25, 2009, 4:04 AM > > > : Using a multicore approach, you > > > could send a "create a core named > > > : 'core3weeksold' pointing to > '/datadirs/3weeksold' " > > > command to a live Solr, > > > : which would spin it up on the fly. Then > you query > > > it, and maybe keep it > > > : spun up until it's not queried for 60 seconds > or > > > something, then send a > > > : "remove core 'core3weeksold' " command. > > > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler > > > . > > > > > > something that seems implicit in the question is > what to do > > > when the > > > request spans all of the data ... this is where > (in theory) > > > distributed > > > searching could help you out. > > > > > > index each days worth of data into it's own core, > that > > > makes it really > > > easy to expire the old data (just UNLOAD and > delete an > > > entire core once > > > it's more then 30 days old) if your user is only > searching > > > "current" dta > > > then your app can directly query the core > containing the > > > most current data > > > -- but if they want to query the last week, or > last two > > > weeks worth of > > > data, you do a distributed request for all of the > shards > > > needed to search > > > the appropriate amount of data. > > > > > > Between the ALIAS and SWAP commands it on the > CoreAdmin > > > screen it should > > > be pretty easy have cores with names like > > > "today","1dayold","2dayold" so > > > that your app can configure simple shard params > for all the > > > perumations > > > you'll need to query. > > > > > > > > > -Hoss > > > > > > > > > > > > > > > > > > >