Are you storing (in addition to indexing) your data?  Perhaps you could turn
off storage on data older than 7 days (requires reindexing), thus losing the
ability to return snippets but cutting down on your storage space and server
count.  I've experienced 10x decrease in space requirements and a large
boost in speed after cutting extraneous storage from Solr -- the stored data
is mixed in with the index data and so it slows down searches.
You could also put all 200G onto one Solr instance rather than 10 for >7days
data, and accept that those searches will be slower.

Michael

On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer <silentsurfe...@yahoo.com>wrote:

> Hi,
>
> Thank you Michael and Chris for the response.
>
> Today after the mail from Michael, we tested with the dynamic loading of
> cores and it worked well. So we need to go with the hybrid approach of
> Multicore and Distributed searching.
>
> As per our testing, we found that a Solr instance with 20 GB of
> index(single index or spread across multiple cores) can provide better
> performance when compared to having a Solr instance say 40 (or) 50 GB of
> index (single index or index spread across cores).
>
> So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve
> instances.
>
> On day 2 data, 10 more Solr slave servers are required; Cumulative Solr
> Slave instances = 200*2/20=20
> ...
> ..
> ..
> On day 30 data, 10 more Solr slave servers are required; Cumulative Solr
> Slave instances = 200*30/20=300
>
> So with the above approach, we may need ~300 Solr slave instances, which
> becomes very unmanageable.
>
> But we know that most of the queries is for the past 1 week, i.e we
> definitely need 70 Solr Slaves containing the last 7 days worth of data up
> and running.
>
> Now for the rest of the 230 Solr instances, do we need to keep it running
> for the odd query,that can span across the 30 days of data (30*200 GB=6 TB
> data) which can come up only a couple of times a day.
> This linear increase of Solr servers with the retention period doesn't
> seems to be a very scalable solution.
>
> So we are looking for something more simpler approach to handle this
> scenario.
>
> Appreciate any further inputs/suggestions.
>
> Regards,
> sS
>
> --- On Fri, 9/25/09, Chris Hostetter <hossman_luc...@fucit.org> wrote:
>
> > From: Chris Hostetter <hossman_luc...@fucit.org>
> > Subject: Re: Can we point a Solr server to index directory dynamically
> at  runtime..
> > To: solr-user@lucene.apache.org
> > Date: Friday, September 25, 2009, 4:04 AM
> > : Using a multicore approach, you
> > could send a "create a core named
> > : 'core3weeksold' pointing to '/datadirs/3weeksold' "
> > command to a live Solr,
> > : which would spin it up on the fly.  Then you query
> > it, and maybe keep it
> > : spun up until it's not queried for 60 seconds or
> > something, then send a
> > : "remove core 'core3weeksold' " command.
> > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> > .
> >
> > something that seems implicit in the question is what to do
> > when the
> > request spans all of the data ... this is where (in theory)
> > distributed
> > searching could help you out.
> >
> > index each days worth of data into it's own core, that
> > makes it really
> > easy to expire the old data (just UNLOAD and delete an
> > entire core once
> > it's more then 30 days old) if your user is only searching
> > "current" dta
> > then your app can directly query the core containing the
> > most current data
> > -- but if they want to query the last week, or last two
> > weeks worth of
> > data, you do a distributed request for all of the shards
> > needed to search
> > the appropriate amount of data.
> >
> > Between the ALIAS and SWAP commands it on the CoreAdmin
> > screen it should
> > be pretty easy have cores with names like
> > "today","1dayold","2dayold" so
> > that your app can configure simple shard params for all the
> > perumations
> > you'll need to query.
> >
> >
> > -Hoss
> >
> >
>
>
>
>
>
>

Reply via email to