Thanks a lot Erick... You are right we should not delay moving to
sharding/SolrCloud process.

As you all are expert... currently we are using SOLR 4.7.. Do you suggest
we should move to latest SOLR release 5.1.0 ? or we can manage the above
issue using SOLR 4.7

Regards
Vishal

On Wed, May 27, 2015 at 2:21 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Hard to say. I've seen 20M doc be the place you need to consider
> sharding/SolrCloud. I've seen 300M docs be the place you need to start
> sharding. That said I'm quite sure you'll need to shard before you get
> to 2B. There's no good reason to delay that process.
>
> You'll have to do something about the join issue though, that's the
> problem you might want to solve first. The new streaming aggregation
> stuff might help there, you'll have to figure that out.
>
> The first thing I'd explore is whether you can denormlized your way
> out of the need to join. Or whether you can use block joins instead.
>
> Best,
> Erick
>
> On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop <vishal....@gmail.com>
> wrote:
> > Currently, we have SOLR configured on single linux server (24 GB physical
> > memory) with multiple cores.
> > We are using SOLR joins (https://wiki.apache.org/solr/Join) across
> cores on
> > this single server.
> >
> > But, as data will grow to ~2 billion we need to assess whether we’ll need
> > to run SolrCloud as "In a DistributedSearch environment, you can not Join
> > across cores on multiple nodes"
> >
> > Please suggest at what point or index size should we start considering to
> > run SolrCloud ?
> >
> > Regards
>

Reply via email to