Hi,

Thank you Michael and Chris for the response. 

Today after the mail from Michael, we tested with the dynamic loading of cores 
and it worked well. So we need to go with the hybrid approach of Multicore and 
Distributed searching.

As per our testing, we found that a Solr instance with 20 GB of index(single 
index or spread across multiple cores) can provide better performance when 
compared to having a Solr instance say 40 (or) 50 GB of index (single index or 
index spread across cores).

So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve 
instances.

On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*2/20=20
...
..
..
On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*30/20=300

So with the above approach, we may need ~300 Solr slave instances, which 
becomes very unmanageable.

But we know that most of the queries is for the past 1 week, i.e we definitely 
need 70 Solr Slaves containing the last 7 days worth of data up and running.

Now for the rest of the 230 Solr instances, do we need to keep it running for 
the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) 
which can come up only a couple of times a day.
This linear increase of Solr servers with the retention period doesn't seems to 
be a very scalable solution. 

So we are looking for something more simpler approach to handle this scenario. 

Appreciate any further inputs/suggestions.

Regards,
sS

--- On Fri, 9/25/09, Chris Hostetter <hossman_luc...@fucit.org> wrote:

> From: Chris Hostetter <hossman_luc...@fucit.org>
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 4:04 AM
> : Using a multicore approach, you
> could send a "create a core named
> : 'core3weeksold' pointing to '/datadirs/3weeksold' "
> command to a live Solr,
> : which would spin it up on the fly.  Then you query
> it, and maybe keep it
> : spun up until it's not queried for 60 seconds or
> something, then send a
> : "remove core 'core3weeksold' " command.
> : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> .
> 
> something that seems implicit in the question is what to do
> when the 
> request spans all of the data ... this is where (in theory)
> distributed 
> searching could help you out.
> 
> index each days worth of data into it's own core, that
> makes it really 
> easy to expire the old data (just UNLOAD and delete an
> entire core once 
> it's more then 30 days old) if your user is only searching
> "current" dta 
> then your app can directly query the core containing the
> most current data 
> -- but if they want to query the last week, or last two
> weeks worth of 
> data, you do a distributed request for all of the shards
> needed to search 
> the appropriate amount of data.
> 
> Between the ALIAS and SWAP commands it on the CoreAdmin
> screen it should 
> be pretty easy have cores with names like
> "today","1dayold","2dayold" so 
> that your app can configure simple shard params for all the
> perumations 
> you'll need to query.
> 
> 
> -Hoss
> 
>





Reply via email to